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Introduction 


Programming in machine code has never been more popular. 
Almost completely dismissed a few years ago as a tedious 
anachronism, machine code is recognised today as essential in all 
programs which need absolute speed and control. 





For over four years, PCW has published a regular series, PCW 
Sub Set, devoted entirely to assembly language programming. 
Here for the first time the best 6502 routines and programming 
hints from the series are collected together in one volume, along 
with 6502 conversions of some of the Z-80 routines. 


All the routines have been rechecked to find those elusive bugs 
that somehow managed to get into the original magazine versions. 
The documentation and listing format has been completely 
revised to new Sub Set standards, giving as much information as 
possible about the routines and the way that they work. 


PCW SUB SET 


The following letter is an edited version of that which appeared 
under the heading of ‘Software datasheets’ in the COMMUNICA- 
TION section of PCW in May, 1980. Although it was aimed 
primarily at the Z-80 programmer, other codes (including the 6502) 
were not forgotten. A 6502 version of the sample routine can be 
found in chapter one. 






SOFTWARE 





DATASHEETS 
Nothing makes me so wild asto —_ Any software takes time to 
hear computer hobbyists being develop and test into something 


told that any program that more __ the originator can use. Just a little 
or less works is a good one. Bad ___ extra time could turn it into 
programming, whether something that could be shared 
perpetrated by professionals or and improved by others, so that 
amateurs, is an abomination that everyone finishes up with a set of 
pollutes the mental processes of _ first class software products. 
anyone, including those who 

actually wrote it, who might later I would like, through PCW, to get 


want to use and modify it. a group of people writing Z-80 
Hobbyists do not have to use general purpose subroutines, to 
machine code programming of defined standards, for criticism 


anything but the highest quality. and improvement by others. I 
V 


would edit and check that 
contributions worked and 
conformed to standard and even 
supply routines, if necessary, for 
the first few months, to get the 
project off the ground. The ideas 
of the Z-80 routines could, of 
course, be worked in 6502, 6800 
and other machine codes as well. 


To illustrate the standards I have 
in mind, I list Rules and 
Documentation standards for the 
routines, together with an 
example. I would try to improve 
on the presentation of these, 
perhaps PCW readers could help 
with this. 


Rules and Documentation for 
Software Datasheets for Z-80 
general purpose subroutines. 
(Developed from the paper 
‘Microcomputer Software 
Design’ by Thomas P. Hughes, 
Dwight H. Sawin III and David 
R. Hadden Jr. of the U.S. Army 
Electronics Command.) 


RULES 


1. Registers not being used to 
convey data into or out of the 
routine will, if used by the 
routine, be saved on entry to and 
restored before exit from the 
routine. 


; = DL1S - One second delay at 2MHz 


s/ ‘DL1S' - Level 0 


:/ To use 2000000 time states, inclusive of call, 


without other effect. 
s/ ACTION: ( SP-2 ) :=L 
( SP-1 ) := H 
H := ( SPt+1 ) 
L := ( SP ) 


repeated 42,551 times 


3/ INPUT: None 

3/ OUTPUT: None 

3/ REGS USED: None 

3/ STACK USE: 6 

3/ LENGTH: 19 

;/ SUBr DEPENDENCIES: None 

3/ INTERFACES: None 

3/ 8080 COMPATIBLE?: No 

DL1S 
PUSH AF » save flags F5 
PUSH BC > & registers c5 
LD BC,42551 » set loop counter 01 37 A6 
PUSH HL > main E5 
POP HL 3 delay E 1 
DEC BC » decrement counter 0B 
LD A,C s set zero flag only if both 79 
OR B » bytes of BC are zero BO 
JR NZ,-7 s jump if not zero to loop 20 F9 
PUSH HL 3 make up E5 ; 
POP HL » delay to E1 
NOP » 1,999,983 00 
NOP > time states 00 
POP BC s restore registers C1 
POP AF : and flags F 1 
RET ; return C9 
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2. It is assumed that the general 
routines library will always be in 
memory (possibly ROM) so that 
routines may call other general 
routines. 

3. RAM addresses, outside the 
general routines’ library, will 
never be explicitly specified in 
routines. References to RAM 
may be made through the 
contents of registers, which the 
caller supplies as pointers or as 
address parameters immediately 
following the call in the main 
routine. 

4. Registers HL, DE, IX and IY 
will be used as pointers to RAM. 
5. Registers B and BC will be 
used to pass single and double 
byte counts. 

6. To avoid having areas of RAM 
that need to be defined by the 
user, the stack may be used for 
local RAM. . 

7. Data may be supplied to a 
subroutine as parameters 


A very interesting letter. I agree 
with Alan’s sentiments and would 
like to help him move this idea 
forward. The magazine welcomes 
all ideas, proposals and modifica- 
tions from interested readers; 
some we'll publish and all will be 
passed to Alan. Please write stat- 


With the help and encouragement of David Tebbutt, then Editor of 
PCW, and a quick response from several Z-80 machine code 
enthusiasts, PCW Sub Set began as a regular series in September 
1980. It was presented by Alan Tootill until July 1984. Unlike other 
sections of PCW, contributions to Sub Set went unpaid until April 


immediately following the call in 
the main routine. 

8. The alternate register set will 
not be used by routines, to leave 
it available for processing 
interrupts. 

9. Routines that call no other 
routines are classed as level 0 and 
all others as level 1. 


~ DOCUMENTATION 


1. The first part of the 
documentation, marked ‘;=’, 
contains a brief textual 
description of the routine. 

2. The second part, marked ‘;/’, 
contains a technical description in 
eleven sections developed after a 
format used by Nicaud. 

3. The third part is a complete 
listing of the routine, with 
assembler mnemonics, comments 
and object code. 


Alan Tootill, 
Enfield, 
Middlesex. 


ing your area of interest and likely 
involvement, marking your en- 
velope ‘standards’. If you do not 
wish your letter to be printed 
please write ‘not to be published’ 
on it. I look forward to hearing 
from you - Ed. 


1982. From the start everyone was prepared to share their routines 


and ideas freely, accepting the criticism and improvement of their 


work by other contributors as reward enough. 


Vil 


THE FIRST SUB SET 


Most of the September article was given over to a summary of the 
criticism received by Alan of the proposed ‘Rules and Documenta- 
tion’. Then, as now, contributors to Sub Set didn’t want to be too 
restricted in the standard of code sent in. This was the beginning of 
Sub Set’s renowned classification system with routines which met 
the rigorous standards being awarded Class | status - others being 
published as Class 2 routines. 


Another innovation in the September 1980 issue concerned the 
Datasheet method of indicating jumps and calls to absolute 
addresses. Reference to addresses in the general routine area would 
be by means of labels in the operand field and by the dummy 
symbols ‘XX XX’ in the machine code. Reference to any other area of 
memory - in particular to addresses within the routine - would also 
use labels in the assembler listing but in this case the symbols ‘YY YY’ 
would take the place of the address in the code. 


A new section was added to the documentation saying if the routine 
was ‘Time critical’. If so then the description had also to declare 
either the exact number or the maximum number of Time states 
(system clock cycles) used by the routine. The declaration would be 
optional for routines not time critical. 


Documentation changes 


Minor revisions were made to the documentation of Sub Set 
Datasheets at irregular intervals in the succeeding months. General- 
ly, however, the initial format suited Z-80 routines quite well and for 
the first few months Z-80 was the only code being sent in by readers. 
It was not until September 1981 that codes other than Z-80 appeared 
in any quantity. 


Sub Set standards were perfectly adequate for the 6809 which, like 
the Z-80, has enough registers to deal with parameter input. But the 
6502, with only three 8-bit registers, has to use specific memory 
locations in Page Zero as pseudo-registers for all but the simplest 
routine. Even parameters embedded in the program after the 
subroutine call are inaccessible without using Page Zero. Unfortu- 
nately, because of system use, no specific Page Zero addresses can be 
guaranteed available. The solution adopted in Sub Set was to 
designate an anonymous block of sixteen Page Zero bytes as 
honorary general purpose ‘registers’. The notation for these in the 
assembler listing would be ‘M0’ to ‘MF’ - symbolised in the machine 
code by ‘22’. 


Vill 


With minor exceptions, Sub Set steered clear of code written for 
specific computer systems during its first four years. But, since the 
hardware in different computers is becoming increasingly diverse 
and incompatible, machine specific routines have been included in 
the series since September 1984. This shift of emphasis prompted a 
rearrangement of the Datasheet format to (a) highlight the specific 
system each routine is written for and (b) identify the routine’s more 
general features as an aid to its possible conversion for other 
computers. 


DATASHEET DOCUMENTATION 


Sub Set Datasheets have three parts: name, description and 
assembler listing. 


The name of the routine consists of the assembler label (marked bya 
preceding ‘=”) and verbal expansion. This part should also contain 
the labels of any other entry points to the routine, these being 
preceded by ‘>’, ‘/’ or similar symbol. 


The descriptive part has fourteen headed sections giving four types 
of information: 


A general definition of the routine: 


e JOB. The task performed by the routine and any special 
considerations such as whether the job is time critical. 


‘e ACTION. The method used to perform the job. - 


System implementation of the routine: 


@ CPU. The processor for which the routine is written. This may 
include details such as the clock frequency specification. 


@ HARDWARE. Any particular computer or hardware configura- 
tion that the routine is written for and/or specifying exactly what 
hardware is used or affected. 


@ SOFTWARE. The name and essential details of any system 
software or subroutines used by the routine. 


Operation details of the routine: 


e INPUT. The meaning of any flags, registers, memory locations, 
stack or other source of information or data passed to the routine. 


1X 


OUTPUT. All information passed back from the routine and the 
contents or state of changed input variables. 


ERRORS. Any possible errors which could result from inputting 
invalid data, the routine being interrupted or from any other 
cause. 


REGISTER USE. All registers disturbed by use of the routine 
either for passing infermation or corrupted by the routine. 


STACK USE. The maximum number of bytes that could be 
added to the stack during execution of the routine. This includes 
deeper subroutine calls from the routine but does not include the 
two-byte return address to the program calling the routine. 


RAM USE. Any read/write memory (excluding stack) used for 
passing information between the routine and the calling program 
or as workspace or storage by the routine. 6502 ‘pseudo-registers’ 
MO to MF are included in this section. 


LENGTH. The number of bytes occupied by the assembled 


- routine. 


CYCLES. This is optional for routines which are not time critical 
but if included should ideally be an expression giving the exact 
number of clock cycles that the routine executes in. If the timing 
is variable then either the maximum and/or a close approximate 
value will suffice. 


Classification of the routine: 


CLASS. A set of six criteria indicating situations where the 
routine may be used safely without any change. Those met by the 
routine are preceded by ‘*’, those not met by ‘-’. Routines are 
Class 1 only if all six are satisfied, thus: 


Discreet: no input variable is changed except to pass informa- 
tion from the routine. 


Interruptable: any interrupt will have no effect on the routine’s 
operation, or vice versa. The Z-80 alternate register set 1s 
reserved for interrupt use. 


Promable: the routine can be fixed in read only memory. It 
does not alter its own code. 


xX 


* Re-entrant: the routine can be entered again without error to 
either re-entered or re-entering use from a program interrupt- 
ing its execution. 


* Relocatable: no change is necessary for the routine to operate 
correctly at any location. 


k Robust: the routine will not ‘crash’ or produce unflagged 
erroneous results for any reason when entered at the correct 
point(s). 


The third part of the Datasheet is the assembler listing. This is a 
standardised form for Sub Set and does not represent the assembler 
conventions of any particular processor. It consists of five fields: 


@ Label. Not more than six characters, the first of which must be a 
letter. 


@ Instruction mnemonic or assembler directive specific to the 
particular CPU instruction set. 


@ Operand specific to the particular CPU instruction set. 


© Comment. Preceded by a colon. Comments may also take up a 
complete line to show the structure of the routine. 


@ Machine code in hexadecimal. Undefined absolute addresses may 
appear as ‘lo hi’ (Z-80 and 6502 form - low order byte first) or 
‘hi Lo’ (6809 form - high order byte first). The 6502 Page Zero 
pseudo-registers are signified by ‘MO’ to ‘MF’. Immediate data 
which can differ between systems may appear as any doubled 
lower-case letter. 
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Delays 


What better routine to start off with than a 6502 version of the first 
Sub Set datasheet — Alan Tootill’s one-second delay routine, 
DL1S. Here it is showing how the revised documentation 
distinguishes between those aspects which are particular to the 
System and processor and those which are not. The operating 
time of each instruction is given alongside in the comments and 
the total for each section is shown in a full comment line before 
that section. 


= DL1S Delay for one second. 
:J0B Time critical routine to use one second's worth 
of clock cycles, including the call to DL1S. 
ACTION Set loop count = INT(Clock Hz / n). 
For count: C use n cycles ]., 
Fine tune for remainder of cycles. 
>CPU 6502 running at 1 MHz. 
> HARDWARE None. 
:SOFTWARE None. 
INPUT None 
OUTPUT None. 
ERRORS Inaccurate if interrupt occurs, or if loop is 
: - located across a page boundary. 
>REG USE None. 
:STACK USE 4 
>RAM USE None. 
> LENGTH 26 
CYCLES 999994 (as 55553 * 18 + 40) 
CLASS 2 xdiscreet -interruptable *promable 
peokok- “reentrant *relocatable -robust 
LPCNTH = $09 shi and lo bytes of 16-bit 
LPCNTL = $01 :loop counter (value 55553). 
>...9ave and initialise. Uses 21 cycles. 
DL1S PHP : 3. Save flags 08 
PHA : 3. and registers 48 
TXA : 2. used 8A 
PHA : 3. in the routine. 48 
LDA #LPCNTH : 2. Get 16-bit loop counter A9 D9 
LDX #LPCNTL : 2. in AX and dec it for end A2 01 
DEX : 2. test on gone below zero. CA 
CLD : 2. Clear for binary subtract. D8 
SEC : 2. Set for first subtract. 38 


DELAYS 


.Main delay loop. Use up 18 * 55553 - 1 cycles. 
“Carry always set for subtraction if loop occurs. 


DLOOP PHA : 3. Save count hi-byte, get 48 
TXA : 2 count lo-byte first 8A 
SBC #1 : 2. and decrement it, take E9 01 
TAX - 2, any carry through hi-byte AA 
PLA 7 4 repeating if C = 1 as 68 
sBC #0 2. C = 0 only when count E9 00 
BCS DLOOP 3/2. goes below 0 at end. BO F6 

.Restore registers and return. Uses 20 cycles. 
PLA : 4, Restore 68 
TAX : 2 registers AA 
PLA : 4, and 68 
PLP : 4, flags used. 28 
RTS : 6 Exit after 1 second. 60 


me we ww we we we ie i ww ww i ww ee ee SS SSS SEE BS Sass 
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ASSEMBLING DLIS FOR OTHER SPEEDS 


Assembling routines in clearly defined sections makes it far easier to 
change and update them. When Alan Tootill wrote DL1S, most Z-80 
computers were running at or around 2 MHz but now the Z-80A, 
running at approximately 4 MHz, is being used in many computers. 
Similarly, more modern computers have the 6502 running at about 2 
MHz instead of the original 1 MHz. In fact, it is not very likely that 
your computer’ s clock speed will be exactly 1 MHz and you will 
probably find it necessary to rewrite DL1S for a delay of precisely 
one second. 


Fhe 21 cycles used in saving and initialising registers on entry to 
DL1S and the 20 cycles néeded to restore register values and return 
are clearly separated out as unchangeable timing overheads. So too 
are the 18 cycles in the main delay section needed to decrement the 
count and loop if not zero. (Only 17 cycles are used when the loop 
terminates since no branch takes place.) However, there are three 
possible variables — the loop count in A and X and two sets of 
dummy instructions which may be inserted in the main delay section 
and in the restore/return section. Thus, DL1S timing has the 
formula, 


(clockMHz- 6) =41+ (LPCNT * ('main'+18))-1+'fine', 
where: 

0 < LPCNT <= 65536, 'main' =>0, 'fine' =>0 

and dummy instructions gre used to exhaust ‘main’ and ‘fine’ times. 


Of course, if your 6502 does run at exactly 2 MHz then a one second 
delay is a simple matter of calling DL1S twice in succession. 
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DELAYS 


A UNIVERSAL DELAY 


The formula used in DL1S can be used to produce delays other than 
one second by calculating the correct variables and, if necessary, 
finding dummy instructions with the right timings. This is the 
method I used for my delays until one day the thought struck me 
that I was wasting a lot of time. I would willingly put up with a 
longer routine if it could be adapted more quickly to the delay time I 
needed. 


The obvious solution was to write a routine that acted dynamically 
on a number equal to the clock cycles in any given fraction of a 
second. It would have to subtract from that number a value exactly 
equal to the clock cycles taken to perform the subtraction, repeating 
this process until reaching zero when the routine would terminate. 
Of course, it couldn’t be that simple: every routine has certain 
overheads such as the time taken to save and restore register values, 
test for loop terminating conditions, and so on, and these have to be 
taken into account. 


UDELAY is a routine of mine which originally appeared in Sub Set as 
UDRS. It is longer and more complex than DL1S but very adaptable. 
For a 2 MHz clock speed, the minimum delay is just under 163 
second and the maximum (with FRAC = 65535 and input X = 0) 
just under 8.4 seconds. 


The problem with UDELAY is, of course, that the system clock 
specification in any computer is only an ideal — the quartz crystals 
used do vary within a given tolerance. The real clock speed of your 
computer is unlikely to be divisible into any exact fraction of a 
second without any remainder. Nevertheless, the very slight 
inexactitude shouldn’t make any noticeable difference except on 
extremely long delays. 


delay of an input value times a unit delay 
(assembled fraction of the system clock hertz) 
: including the subroutine call to UDELAY. 
:ACTION On entry: C Fraction - UDELAY time overheads ]. 
: For count: C Fraction - loop time overheads 
Use up fraction cycles J. 


sHARDWARE None. 
sSOFTWARE None. 


DELAYS 


> INPUT 
OUTPUT 
ERRORS 


REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


X = Number of unit delays wanted (OOH = 256). 
X = 0. 

Inaccurate if interrupt occurs or if FRAC is 
assembled as any value less than 245. 

X 

4 

None. 

45 

FRAC * X Cincluding 6 for JSR UDELAY) 
-discreet -interruptable *promable 
-reentrant *relocatable -robust 


-. Assemble unit delay (FRAC) as the number of clock cycles 


t...71n the 
FRAC 
FRHI 
FRLO 


UDTOH 


NLTOH 


DLTIME 


RCOUNT 


required fraction of a second. 
mmnn 7245 <= nnnn < 65536. 
mm sINTCFRAC / 256). 

nn sFRAC - (FRHI * 256). 


-245 -UDELAY timing overheads, subtracted 
sfrom FRAC on ist entry into NUMLP. 

~205 -NUMLP timing overheads, subtracted 
-from FRAC on each repeat of NUMLP. 


18 “<DIVLP operating time, subtracted from 
:FRAC repeatedly until time out. 
-17 “-REMLP iterations - one for each 


spossible FRAC remainder after DIVLP. 


-...Overheads (incl. JSR): 6 + 15 + 205 - 1 + 20 = 245. 


UDELAY PHP : 3. Save flags 08 
PHA : 3 and registers 48 
TYA : 2. used in 98 
PHA : 3. UDELAY. 48 
CLD : 2 Ensure binary arithmetic. D8 
LDA #UDTOH 2 Negative timing overheads. A9 OB 
:...Repeat X times. Overheads: 10 + 17 + 171 + 7 = 205. 
NUMLP CLC : 2. Subtract timing overheads 18 
ADC #FRLO 2. from unit delay cycles 69 nn 
TAY : 2. in A and Y. Use addition A8 
LDA #-1 2 of 2's complement for Ad FF 
ADC #FRHI 2 ease of repeats. 69 mm 
:...Use up 17 + 18 * INTCCFRAC - overheads) / 18) cycles. 
DIVLP PHA : 3. Subtract DIVLP 48 
TYA : 2 cycles from reduced 98 
SBC #DLTIME : 2 FRAC until time out E9 12 
TAY : 2. when result A8 
PLA : 4, gone below zero. 68 
sBC #0 2. Remainder is in E9 00 
BCS DIVLP 2/3. negative form. BO F6 


DELAYS 


se.-Use up 171 + REMAINDER((FRAC - overheads) / 18) cycles. 
LDA RCOUNT : 2. Loop for maximum possible A9 EF 


REMLP INY : 2, remainder, using 10 C8 
BNE REMCNT : 2/3. cycles if no remainder DO 01 
DEY : 2, in current loop, else 88 

REMCNT ADC #1 : 2. use 11 cycles and reset 69 01 
BNE REMLP > 2/3. remainder count in Y. DO F8 
LDA #NLTOH : 2. Negative timing overheads AY 33 
DEX : 2, for repeat of NUMLP. CA 
BNE NUMLP > 2/3. Repeat unit delay * X. DO DF 
PLA 4 Restore registers 68 
TAY : 2. and flags A8 
PLA : 4, and exit 68 
PLP : 4 after 28 
RTS 6 X * shortest delay. 60 

DEAD KEY SCROLLS 


Watching the results of all your computation evaporating off the top 
of the screen while your brain is still decoding the first word can be 
very frustrating. Inserting a delay in the scroll routine will slow it 
down, of course, but the perfectionist is still not satisfied. Why not 
have a delay routine that you can switch on and off? Then you can let 
the boring bits whizz past and slow the scroll down only when the 
part you want to scrutinize is coming up. And while you are at it, 
you might as well include a ‘wait’ function to temporarily stop 
scrolling. 


All this is achieved at the touch of a sPAcE BAR in this 6502 version of 
SLOWUP — originally appearing in Sub Set in Z-80 code for the 
TRS-80. SLOWUP doesn’t actually switch the delay on and off, 
instead it alternates between short and long delays so the speed of 
both fast and slow scrolls can be adjusted. SLOWUP performs the 
switch by complementing the delay count when it notices the sPpACE 
BAR being pressed. Thus, a long delay of $F FFE becomes the 
shortest possible $0001. Fine-tuning of the delays is done by 
ANDing a mask with the high order byte of the delay count. For 
example, the hi-byte of the delay count could be ANDed with $7F 
and this would halve the long delay $F FFE to $7FFE but have no 
effect on the short complement $0001. | 


Scrolling is not the only activity that could benefit from SLOWUP. 
Try patching it in to those fast action games and give yourself a 
sporting chance to save the Galaxy. 


DELAYS 


:ACTION 


Switch between adjustable long and short delays 
on SPACE-key press with WAIT function until the 
SPACE-key is released. 
Fine-tune delay count by bit masking. 
For repeat count: 
C For delay count: C use time j ]. 
IF SPACE-key pressed THEN: 
C Complement delay count. 
Wait for key release. ] 


:CPU 
sHARDWARE 
:SOFTWARE 


6502 

Keyboard. 

Written to patch in to SCROLL routine. 

"INKEY" - subroutine to return ASCII code of key 
pressed in A without changing other registers. 


OUTPUT 


>ERRORS 
>REG USE 
>STACK USE 
>RAM USE 

>: LENGTH 
CYCLES 


2-byte delay count in MQ,1. 

1-byte delay repeat count in Me. 

1-byte delay count bit-mask in M3. 

M0,1 complemented if SPACE BAR pressed. 
Registers, M2 and M3 unchanged. 
Double-complement can occur if re-entered. 
None. 

7 + INKEY stack use in excess of 5. 

MO to M3 

64 . 

Minimum delay if SPACE bar not pressed: 
REPCT * (delaycount * 20 + 36) + 44 + INKEY. 
{ 0 < REPCT <= 256. O < delaycount <= 65536. } 


kkk k- 


SLOWUP 


Oo 

oO 

= 

> 

” 

~ 
tomb oot 


PHP 
PHA 
TXA 
PHA 
LDX 


».Delay. 
REPLP 


LDA 
PHA 
LDA 
PHA 
AND 
STA 


kdiscreet *interruptable *promable 
-reentrant *relocatable -robust 
M0 :Stored 2-byte delay count. 


M2 :Stored coarse-tune delay loop count. 
M3 :Stored fine-tune mask for delay count. 
$20 :ASCII SPACE. 
:Save flags 08 
sand registers 48 
sused in SLOWUP 8A 
zand load X with delay repeat 48 
REPCNT :count for coarse timing. A6é M2 
SLOCT :Save delay count A5 MO 
son stack so it can be 48 
SLOCT+1 :directly used as count. AS M1 
:Fine tune by masking out 48 
DCMASK :selected bits of delay 25 M3 
SLOCT+1 :count hi-byte before use. 85 M1 


:...Main delay loop. 


DELLP LDA 
BNE 
DEC 
DEC 
LDA 
ORA 


BNE 


LODEC 


PLA 
STA 
PLA 
STA 
DEX 
BNE 


SLOCT 
LODEC 
SLOCT+1 
SLOCT 
SLOCT 
SLOCT+1 
DELLP 


SLOCT+1 


SLOCT 


REPLP 


Min. 20, max. 24 cycles each loop. 
:Test if count lo-byte 
:is zero, if it is then 
:dec hi-byte as well. 


:Dec count lo-byte. 


:Test if 16-bit 


:delay count is at zero, 


zrepeating until time out. 


>Restore delay 


count to 


:page zero 
:for next delay. 
>Coarse tune by any repeats 


:of main delay. 


-..«Possible switch between long/short delay. 


JSR 
CMP 
BNE 


LDA 
EOR 
STA 


:...Wait. 

RELLP JSR 
CMP 
BNE 


SUPEND PLA 
TAX 


INKEY 
#SPACE 
SUPEND 


SLOCT+1 
#$ FF 
SLOCT+1 


INKEY 
#SPACE 
RELLP 


:Get any key press and 
:test for SPACE BAR, 
>exit if not pressed. 


:SPACE pressed so switch from 
z:long to short/short to long 
>by bit changes. 


:Loop until call to INKEY 
sreturns SPACE BAR not 
pressed. 


:Restore registers and 
:flags used in SLOWUP 
sand exit after 

:long or short 

delay. 


DELAYS 


M1 


M0 


DF 


hi 


hi 


Extra 


Instructions 





Two types of ‘extra instructions’ are dealt with in this chapter. 
The first sort are those which the 6502 will execute if fed 
unspecified codes. The second are desirable operations similar to 
those carried out by other processors but entirely lacking from the 
6502 instruction set. This latter type has to be emulated by short 
routines. 


UNSPECIFIED INSTRUCTIONS 


The 6502 uses only 151 of the 256 instructions made possible by an 
8-bit opcode. The actions caused by the other 105 are ‘unspecified’ 
and no wonder! 


Andrew Johnson obviously had a tremendous amount of patience 
when he worked out the effects of the unspecified codes. Most of 
them combine two distinct normal actions but several produce 
changes to more than one register or memory location. His findings 
are listed here for you to use as you see fit. 


Firstly, the largest group containing instructions which can be 
described as a sequence of 2 specified actions (though perhaps with 
non-specified addressing modes): 


Code Effect (1) then Effect (2) 
07 zz ASL 2z ORA zz 
17 22 ASL 2z,X ORA z2z,X 
2? 22 ROL 22 AND 2z 
37 22 ROL 2z,X AND 2z,X 
47 22 LSR 2z EOR 22 
57 22 LSR zz,X EOR 2z,X 
6/7 22 ROR zz ADC 2z 
(7 22 ROR zz,X ADC 2z,X 
A? 22 LDX zz LDA 22 
B7 zz LDX zz,Y LDA zz,Y 
C7 zz DEC zz CMP zz 
D7 zz DEC 2z,X CMP zz,X 
E7 z2 INC 2z SBC zz 
F7 22 INC 22,X SBC z2,X 


EXTRA INSTRUCTIONS 


Secondly, a group which cannot easily be expressed using standard 
mnemonics: 


10 


hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 
hi 


hi 


hi 


hi 


hi 


hi 
hi 


hi 
hi 


22 
22 
zz 
22 
lo hi 
lo hi 


Effect (1) 


ASL 
ASL 
ROL 
ROL 
LSR 
LSR 
ROR 
ROR 
LDX 
LDX 
DEC 
DEC 
INC 
INC 


ASL 
ASL 
ROL 
ROL 
LSR 
LSR 
ROR 
ROR 
LDX 
LDX 
DEC 
DEC 
INC 
INC 


AND 
ASL 
AND 
ROL 
AND 
LSR 
AND 
ROR 
TXA 
LDA 
DEC 
SBC 
INC 


STY 
STX 


:store 
:store 
:store 
:store 
:store 
:store 


(22,X) 
(zz),Y 
(2z,X) 
(22),Y 
(z2,X) 
(z2),Y 
(zz,X) 
(2z),Y 
(zz,X) 
(z2),Y 
(zz,X) 
(z2),Y 
(22,X) 
(22),Y 


$hilo 
$hilo,X 
$hilo 
$hilo,X 
$hilo 
$hilo,X 
$hilo 
$hilo,X 
$hilo 
$hilo,yY 
$hilo 
$hilo,X 
$hito 
$hilo,X 


#nn 
$hilo,Y 
#nn 
$hilo,Y 
#nn 
$hilo,Y 
#nn 
$hilo,Y 


#nn 
$hilo,Y 
#nn 
$hilo,Y 


Shilo,X 
$Shilo,Y 


AND 
AND 
AND 
AND 
AND 
AND 


then 


X] 
X] 
X] 
X] 
X] 
XJ 


to 
to 
to 
to 
to 
to 


Effe 


ORA 
ORA 
AND 
AND 
EOR 
EOR 
ADC 
ADC 
LDA 
LDA 
CMP 
CMP 
SBC 
SBC 


ORA 


SBC 


ORA 


CMP 


22 
zz,Y 


ct (2) 


(zz,X) 
(z2),Y 
(zz,X) 
(22),Y 
(zz,X) 
(z2),Y 
(zz,X) 
(z2),Y 
(2z,X) 
(z2),Y 
(zz,X) 
(z2),Y 
(2z,X) 
(z2),Y 


$hilo 
$hilo,X 
$hilo 
$hilo,X 
$hilo 
$Shilo,X 
$hilo 
$Shilo,X 
$hilo 
$Shilo,Y — 
$hilo 
$hilo,X 
$hilo 
$hilo,X 


$hilo,Y 


$hilo,Y 
A 
$hilo,Y 
A 
$hilo,Y 
#nn 


$Shilo,Y 


$hilo,Y 


(2z,X) 
(22),Y 
$hilo 
$hilo,X 


EXTRA INSTRUCTIONS 


9B lo hi :store CA AND X] to S$ and $hilo,Y 
BB lo hi :store [$hilo,Y AND S$] to A, X and §S 
CB nn :store CA AND XJ] - #nn to X 


Thirdly, some extra NOPs (xx is don’t care): 
1-byte: 1A 3A 5A 7A DA FA 
2-byte: 04 xx 14 xx 34 xx 44 xx 54 xx 
64 xx 74 xx 80 xx 89 xx FA xx 
3-byte: OC xx xx 1€ xx xx 3C xx xx 5C€ xx xx 


7C xx xx DC xx xx FC xx xx 


And finally, a group of 14 single-byte instructions which cause the 
6502 to HALT and wait for RESET or possibly an interrupt: 


O02 12 32 42 52 62 72 82 92 B2 C2 bd2 E2 F2 


EXTRA ADDRESSING MODES 


The 6502 pre-indexed indirect, post-indexed indirect and absolute 
indexed by Y addressing modes — ‘(zz,X)’, ‘(zz),Y’ and ‘$hilo,Y’ 
respectively — are not supported for the six instructions ‘ASL, ROL, 
LSR, ROR, DEC and INC’ which act on memory. The unspecified 
instructions do use these three addressing modes but since they also 
combine two actions their use is somewhat limited. However, with a 
little care, they can be made to perform as required. 


‘ASL+ORA’ (codes $1B, $13 and $03) will set the correct C, N and 
Z flags for the Arithmetic Shift Left and store the result both in 
memory and in the Accumulator if A is first cleared. With A initially 
zero, the ‘ORA’ part merely loads A with the result of the ‘ASL’. 
Similarly, ‘ROL+AND?’ will give the correct ‘ROL’ flags if A is 
initially set to $FF. ‘LSR+EOR’ requires A to be zero. 


‘“ROR+ADC’ cannot be relied upon to give the correct ‘ROR’ result 
unless bit 0 of the memory location addressed is known to be reset. 
since bit 0 is moved out to Carry by the rotation, it affects the later 
addition. 
‘DEC+CMP’ could possibly be useful if A is holding the non-zero 
terminal value for a loop count at the time the decrement occurs, 
though this is not likely. ‘INC+SBC’ doesn’t seem to offer a great 
deal of scope for advanced programming. 

I] 


EXTRA INSTRUCTIONS 


THE USUAL WARNING 


None of the codes are guaranteed to produce the effect that Andrew 
Johnson found them to have, even though he did test them on three 
different computers. Another Sub Set reader, Andrew Civil, has 
found that his particular 6502 treats the opcode “9F lo hi’ as meaning 
‘Store [A AND X AND ($hi + 1)] to $hilo, Y’ — not especially useful 
unless you want to store the result in page $FE. 


Assemblers will, of course, throw out non-specified mnemonics and 
operands as invalid so you will have to enter the codes in 
hexadecimal form. And if you manage to get programs using them to 
work in your computer, they may crash when transferred to another. 
You use them at your own risk. 


TOOLKIT 


Many 6502 toolkit routines are quite short and hardly worth the 
bother of writing as subroutines. It takes 3 bytes to call a subroutine 
and one byte to return from it, so a code sequence has to be 5 bytes 
long before any space is saved by writing it as a subroutine. 


In terms of program length, you break even on the fourth call and 
show a profit of 2 bytes on every subsequent call. With 7 bytes of 
code the break-even point comes on the second call. 


But program length is not the only factor that you may have to take 
into account. Timing overheads are considerable on short sub- 
routines: the ‘JSR’ instruction takes 6 clock cycles and the ‘RTS’ 
uses up another 6, making 12 in all — possibly more than the code 
execution time. You must offset this against the ease of writing and 
the greater program readability gained from using subroutines. 


REGISTER ROTATES, TRANSFERS AND 
EXCHANGES 


The 6502 is sadly lacking in instructions to move data between 
registers. It has a group of six transfer instructions which allow 
values to be transferred between A and the index registers and 
between X and the stack pointer. However, movement between X 
and Y is not allowed and exchanges of data cannot take place. 


ROTREX is a toolkit routine of mine which appeared in Sub Set. It 
provides several ‘instructions’ which I think really ought to be in the 
6502 instruction set but does so only at a tremendous cost in 
execution time. 


ROTREX uses a byte saving trick that is often found where space is 
minimal. Entry at EXAY falls through to the RRAXY but must have 
the Carry flag set whereas RRAXY needs Carry reset. So the 2 bytes 
which save P and reset C at the start of RRAXY have to be skipped. 
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EXTRA INSTRUCTIONS 


This could be done by ‘BCS +2’ but would take 2 bytes. Instead the 
skip is achieved in one byte by using the dummy opcode $2C to 
swallow the 2 bytes in the instruction ‘BIT $1808’. This affects only 
the unused N, Z and V flags — not the important Carry flag. Entry at 
RRAXY misses the dummy opcode, of course, and executes ‘PHP, 
CLC’. The same method is used to save another byte when EXAX 
falls through to RLAXY. 


= ROTREX Register rotate, transfer and exchange. 
> TXY Transfer X to Y. 
> EXAY Exchange A with Y. 
> RRAXY Rotate right by one byte through A, X, Y. 
>> TYX Transfer Y to X. 
>> EXAX Exchange A with X. 
> RLAXY Rotate left by one byte through A, X, Y. 
> EXXY Exchange X with Y. 
> PLXYAP Pull register set X, Y, A, P and PC (return). 
> PLYXAP Pull register set Y, X, A, P and PC (return). 
JOB Toolkit to perform Class 1 rotations, transfers 
and exchanges acting on 8-bit registers. 
Also provides "jump to" register set pull from 
stack and return. 
ACTION Push registers. 
Manipulate on stack. 
Pull registers in required order. 
=CPU 6502 
>HARDWARE None 
>SOFTWARE None 
> INPUT None. 
> OUTPUT (Input registers are denoted by single quotes): 
: TXY: Y = 'X's N, Z give sign and zero status. 
TYX: X = 'Y'; N, Z give sign and zero status. 
EXAY: A = 'Y'; Y = 'A'; P is unchanged. 
EXAX: A = 'X'; X = 'A's P is unchanged. 
EXXY: X = 'Y'; Y = 'X'; P is unchanged. 
RRAXY: A = 'Y'; X = 'A'Z Y = 'X's P unchanged. 
RLAXY: A = 'X'; X = "Y's Y = "A's P unchanged. 
PLXYAP: X, Y, A, P, PC pulled (X first). 
: PLYXAP: Y, X, A, P, PC pulled (Y first). 
>ERRORS None. 
:REG USE Any of A, X, Y and P (see OUTPUT). 
:STACK USE 4 at each entry point (0 for PLXYAP and PLYXAP). 
>RAM USE None. 
> LENGTH 56 
:CYCLES TXY: 72. RRAXY: 65. PLXYAP: 26. 
: TYX: 68. RLAXY: 59. PLYXAP: 26. 
EXAX: 64. EXAY: 68. EXXY: 42 
CLASS 1 *discreet *interruptable *promable 
TEKKKKK *reentrant *relocatable *robust 


EXTRA INSTRUCTIONS 
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ROTREX = $0100 


TXY 
EXAY 


RRAXY 


TYX 


EXAX 


RLAXY 


REPAST 


PLXYAP 


EXXY 


TAY 
TXA 


PHP 
SEC 


BYTE $2C 


PHP 
CLC 
PHA 
PHA 
TXA 
PHA 
TYA 
CLV 
BVC 


TAX 
TYA 


PHP 
CLC 


REPAST 


-BYTE $2C 


PHP 
SEC 
PHA 
PHA 
TYA 
PHA 
TXA 


TSX 
INX 
INX 
INX 
STA 
BCC 


PLA 
TAX 
PLA 
TAY 
PLA 
PLP 
RTS 


PHP 
PHA 
TYA 
PHA 
TXA 
PHA 


ROTREX,X 
PLYXAP 


:To index page 1 stack. 


:Move A to Y, X to A 
sthen exchange A with Y. 


:Push P then set C for 
srestore in XYAP order. 
:Skip 2 bytes by "BIT $1808". 


-Push P then reset C for 
srestore in YXAP order. 

7Y will be stored over this A. 
:A will be pulled to-X. 

*X will be 

spulled to Y. 

<Move Y to A for storing. 
:Clear V for Branch always 

sto X/Y store over stacked A. 


:Move A to X, Y to A 
sthen exchange A with X. 


>Push P then reset C for 
‘restore in YXAP order. 
:Skip 2 bytes by "BIT $3808". 


:Push P then set C for 
srestore in XYAP order. 

7X will be stored over this A. 
cA will be pulled to Y. 

7Y will be 

:pulled to X. 

:Move X to A for storing. 


:Index stack and move stack 
:index to point to first A 
spushed. "ROTREX+3,X" wouldn't 
swork if stack wrapped around. 
sOverwrite Ist A with X or Y 
sthen pull in correct order. 


:Exit for 

sTXY, EXAY and RLAXY. 

:Also as "jump to" 

sexit for any routine 
swhich has pushed registers 
sin P, A, Y, X order. 


:Push registers in 

:P, A, Y, X order then 
:pull as though pushed 

:in P, A, X, Y order 

:so X and Y are exchanged 
:without affecting A and P. 


50 OC 


EXTRA INSTRUCTIONS 


PLYXAP PLA :Exit for EXXY, 68 


TAY :TYX, EXAX and RRAXY. A8 

PLA :Also as "jump to" 68 

TAX -exit for any routine AA 

PLA z:which has pushed registers 68 

PLP >in P, A, X, Y order. 28 

RTS : 60 
USING ROTREX 


ROTREX does provide some elementary functions on which to build 
other missing instructions. Here are three short subroutines to 
perform simple operations that unhappily weren’t programmed into 
the 6502. Apart from the Z flag in the 16-bit left shift, all return the 
correct status — something not done by the ‘PHA, go-through-A, 
PLA’ method. | 


:...Transfer Stack pointer to index register Y. 


TSY JSR  EXXY :Save X in Y and move Stack 20 lo hi 
TSX :pointer to X, bump it past BA 
INX return address then, by jump E8 
INX >to EXXY, transfer it to Y, E8 
JMP EXXY srestore X and return. 4C lo hi 
:...Increment accumulator. 
INCA JSR EXAX :Save X in A, get A in X and 20 lo hi 
INX :add 1 affecting only N & Z. E8 
JMP EXAX >Restore A & X and return. 4C lo hi 
:...Arithmetic Shift Left XY. 
ASLXY JSR RRAXY :Rotate Y into A and shift 20 lo hi 
ASL A left, 0 into bit 0. OA 
JSR RRAXY :Rotate X into A and shift 20 lo hi 
ROL A :left, C from Y into bit 0. 2A 
JMP RRAXY :Restore all regs and return. 4C lo hi 


QUICKER TRANSFERS, EXCHANGES AND 
ROTATES | 


Suitably abashed at the slowness of the ROTREX package I have 
provided TERAXY which demonstrates some quicker methods of 
attaining the same ends. This time though it is at the cost of 
corrupting one page zero location and not being re-entrant. Unlike 
ROTREX, the routines in TERAXY all return possibly useful flag 
information — and operate without the benefit of confusing tricks! 


IS 


EXTRA INSTRUCTIONS 


>= TERAXY Register transfer, exchange and rotate. 
27> TXY Transfer X to Y. 
>> TYX Transfer Y to X. 
>> EXAX Exchange A with X. 
> EXAY Exchange A with Y. 
=> RRAXY Rotate right by one byte through A, X, Y. 
>> RLAXY Rotate left by one byte through A, X, Y. 
JOB Toolkit to perform transfers, exchanges and 
rotations acting on 8-bit registers, returning 
Sign and Zero status. 
ACTION Use temporary RAM store. 
If necessary, test register to set status. 
7 CPU 6502 
sHARDWARE None 
>SOFTWARE None 
INPUT None. 
:OUTPUT MO changed in each case. 
: (Input registers are denoted by single quotes): 
TXY: Y = 'X':s Y sign and zero status in N, 2. 
TYX: X = "Y's X sign and zero status in N, 2. 
EXAX: A = 'X's X = 'A'; A status in N, Z. 
EXAY: A= 'Y's Y = 'A'; A status in N, Z. 
RRAXY: A= 'Y¥'; X = 'A'; Y = 'X! 
A status in N, Z. 
RLAXY: A= 'X'y X = 'Y's Y = TAT 
: A status in N, Z. 
>ERRORS Re-entry would overwrite register values in 
: temporary i-byte store, MOQ. 
=REG USE Any of A, X, Y and P (see OUTPUT). 
:STACK USE 0 
>RAM USE MO 
>LENGTH 40 
CYCLES TXY: 12. EXAX: 14. RRAXY: 18. 
: TYX: 12. EXAY: 14. RLAXY: 18. 
CLASS 2 -discreet *interruptable *promable 
pokk-*k- -reentrant *relocatable -robust 
TXY STX M0 :Move X to temporary storage then 86 MO 
LOY M0 :into Y, getting N and Z status. A4& MO 
RTS : 60 
TYX STY MO sMove Y to temporary storage then 84 MQ 
LDX MO :into X, getting N and Z status. A6é M0 
RTS : 60 
EXAX STX MO “Move X to temporary storage. 86 M0 
TAX sMove A to X then get X from AA 
LDA MO stemp store to A giving N and 2 A5 MO 
RTS :status of value returned in A. 60 
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EXAY STY MO >Move Y to temporary storage. 84 M0 


TAY :Move A to Y then get Y from A8 
LDA MO :temp store to A giving N and 2 A5 m0 
RTS >Sstatus of value returned in A. 60 
RRAXY STX MO :Store X temporarily while 86 M0 
TAX :moving right A to X and AA 
TYA >Y to A. Then complete rotation 98 
LDY MO :by moving stored X to Y, A4& M0 
ORA #0 Get N and Z status of value now 09 00 
RTS >in A by logical test of A. 60 
RLAXY STY MO :Store Y temporarily while 84 MO 
TAY :moving left A to Y and A8 
TXA >X to A. Then complete rotation 8A 
LDX MO :by moving stored Y to X. A6é M0 
ORA #0 :Get N and Z status of value now 09 00 
RTS >in A by logical test of A. 60 
STACK TOP EXCHANGES 


The Z-80 has a set of 3 powerful instructions to exchange any of the 
16-bit registers HL, IX or IY with the two bytes on top of stack. 
These execute in 19 or 23 clock cycles. As a rough guide to the worth 
of these instructions, 2 Z-80 clock cycles are usually assumed to be 
the equivalent of one 6502 clock cyle. 


The 6502 lacks any instruction to exchange stack top values. This is 
a pity because the return address to a calling program can be used as 
a pointer to subroutine parameters embedded in the program 
immediately after the ‘JSR’ instruction. 


The only way to exchange a 6502 register with the top of stack is to 
write a routine and EXSX by Robert Whisson shows one way of 
doing the job. In fact it doesn’t exchange X with the current top of 
stack but with the value that was on top immediately before JSR 
EXSX put the 2-byte return address above it. If you decide to use the 
code in sequence rather than as a subroutine you must change the 
stack indexing in EXSX from ‘$0105,X’ to ‘$0103,X’ 


= EXSX Exchange (stack pointer), index register. 

:J0B To exchange the 8-bit value held in a register 
: with that on the top of stack. 

:ACTION Push register. 


Index entry top of stack, get value and push it. 
Index stacked register value, get value. 

Index entry top of stack, store register value. 
Pull top of stack value to register. 

Pull and discard stacked register value. 


EXTRA INSTRUCTIONS 


2 CPU 

>HARDWARE 
>SOFTWARE 
> INPUT 
OUTPUT 


ERRORS 


:REG USE 
:STACK USE 
:RAM USE 
:LENGTH 
:CYCLES 


:CLASS 2 
rakkkke 


None. 

X contains the value at top of stack on entry. 
Top of stack contains the value in X on entry. 

P is changed (N and Z give status of A). 

The routine could alter one or two values at the 
start of page two, and not X and (S), if a 
wraparound stack is used. 

X P 


-discreet *interruptable *promable 
xreentrant *relocatable -robust 


-Save A for use on EXSX. 48 
:Stack X and move stack 8A 
:pointer to X, so X indexes 48 


-5 below pre-JSR stack top. BA 


$0105,X :Get value from input stack BD 05 01 


:top and save on new top. 48 
$0101,X :Get pushed input X and store BD 01 01 
$0105,X :to input stack top. 90 05 01 

‘Restore input stack top 68 

svalue to X (exchange done). AA 

-Discard pushed X value. 68 

:Restore A 68 

:and exit. 60 


EXSX can easily be extended to exchange X and Y with the top 2 
bytes of stack (prior to the call) by inserting the following 3 
instructions at the end of the middle section, after ‘STA $0105,X’. 
The extended form (EXSXY) can be called by a subroutine which 
needs to access the return address to the original calling program. 


TYA 
LDY $0106,X :2nd on stack on entry to Y. BC 06 01 
STA $0106,X :Move input Y to 2nd on stack. 9D 06 01 
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:Move input Y to A and get 98 


Register 


Indirect 


The Register Indirect addressing mode is not implemented on the 
6502 — basically, one suspects, because of the lack of 16-bit 
registers. In the Z-80 and other 8-bit processors which do boast a 
few 16-bit registers it is one of the most powerful addressing 
modes. After all, incrementing and adjusting register contents is 
quick and easy. 





XYMOD by David Heale is a routine which attempts to make good this 
omission by substituting the contents of the X and Y index registers 
(doubled up to hold a 16-bit address) for the address operand of any 
3-byte instruction. The instruction must follow straight on after JSR 
XYMOD. 


As an example, if X contains $AB and Y contains $CD, then 


JSR XYMOD :Call followed by instruction 20 lo hi 
ROL SFFFF z:with dummy address field. 2E FF FF 


will become, 


JSR XYMOD :Call to XYMOD inserts XY 20 lo hi. 
ROL $ABCD :into ROL address field. 2E CD AB 


during the execution of XYMOD without either X or Y being affected. 
If that piece of code is written inside a loop which alters the value of 
X and Y then a different memory location will be rotated in each 
iteration. 


7J0B To replace the 2-byte address operand of an 
: instruction immediately following the call to 
: XYMOD with an address input in registers. 
:ACTION Use return address as pointer to instruction. 
: Index instruction second byte. 
Move input low order byte to instruction. 
Index instruction third byte. 
Move input high order byte to instruction. 


REGISTER INDIRECT 


: CPU 6502 

>HARDWARE None. 

:SOFTWARE None. 

> INPUT. XY contains 16-bit address (high order in X). 

: A 3-byte instruction should follow immediately 
: after "JSR XYMOD" in the calling program. 
OUTPUT The contents of XY are stored in program bytes 2 
: and 3 after "JSR XYMOD". 

: M1 to M4 and P are changed. 

> ERRORS No check is made on the calling program to test 
: that it is in read/write memory. 

:REG USE X Y P 

:STACK USE None. 

>RAM USE M1 to M4 


: LENGTH 29 

CYCLES 61 

be ee ee een eee . 

CLASS 2 -discreet *interruptable *promable 

pr akKK- xreentrant *relocatable -robust 

XYMOD STA M1 :Save registers A and Y in page’ 85 M1 
STY M4 :zero for use in XYMOD. 84 M4 
PLA -Move stacked Program Counter 68 
STA M2 :(j.e. return address - 1) 85 M2 
PLA :to page zero as pointer to 68 
STA M3 program bytes to be accessed, 85 M3 
PHA zand return it to stack 48 
LDA M2 :for RTS at routine end. A5 M2 
PHA : 48 
LDY #2 :Index lo-byte of instruction. AO 02 
LDA M4 <Get input lo-byte (saved Y) A5 M4 
STA (M2),Y -and store in program address. 91 M2 
INY :Index hi-byte of instruction. C8 
TXA :Get input hi-byte from X to A 8A 
STA (M2),Y sand store in program address. 91 M2 
LDY M4 :Restore saved Y and A A& M4 
LDA M1 :from page zero and return to A5 M1 
RTS sexecute modified instruction. 60 

THE ZERO OPTION 


XYMOD’s appearance in Sub Set prompted an immediate response 
from readers aghast at the idea of a routine modifying the very code 
that called it. 


Improved, ‘respectable’ versions, by John Kerr (RINXY) and Conor 
O’Neill (XYMODS) don’t even hint at altering the calling program. 
Instead they both access the instruction opcode, using it and the 
address from XY to write a one-off subroutine. Each routine exits 
through its own creation. 
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REGISTER INDIRECT 


The location chosen by both John and Conor for the single 
instruction subroutine is within the set of 16 page zero ‘pseudo- 
registers’, MO to MF. These are assumed to be dealt with by the 
system as though they were registers — this being the simplest way of 


standardising page zero use to the sort of general purpose routines in 
Sub Set. 


The method of RINXY obviates the need for the address operand in 
the instruction following ‘JSR’ and so the call is implemented in only 
4 bytes (the opcode here is for ‘ROL’): 


JSR RINXY Call followed by opcode 20 lo hi 
~-BYTE $2E :of instruction RINXY forms. 2E 


XYMODS retains the dummy address operand: 


JSR XYMODS :Call followed by essential 20 lo hi 
ROL S$FFFF :opcode and helpful operand. 2E FF FF 


The operand is needed as little by XYMODS as by RINXY but Conor 
was aware that to leave it in makes program readability and 
assembly/disassembly much easier. 


The most important difference to be found between the 2 routines is 
in the way that the constructed subroutines, RISUB and XYSUB, 
terminate. XYMODS pulls the return address from stack and uses it to 
form a 3-byte ‘JMP’ back to the calling program but RI NXY, 
corrupting fewer page zero locations, ends the subroutine with a 
1-byte ‘RTS’. However, if the opcode is $4C or $6C —‘JMP $hilo’ or 
‘JMP ($hilo)’ — then the end of the page zero subroutine is never 
executed. This is unimportant in XYSUB but in RISUB the skipped 
‘RTS?’ will cause a stacking error. 


:J0B To provide a Register Indirect addressing mode 
: by building up a single instruction subroutine 
from an opcode following the call to RINXY and 

: a register input address. 

: ACTION Increment return address past opcode byte. 

: Use return address as pointer to opcode byte. 
Copy opcode byte to byte 1 of single-instruction 
subroutine. 

Store 16-bit register contents to bytes 2 and 3 
as subroutine address operand. 

Store RETURN opcode to last byte of subroutine. 
Jump to subroutine. 


REGISTER INDIRECT 
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= CPU 
sHARDWARE 
>SOFTWARE 


6502 

None. 

"RISUB" - a 4-byte subroutine constructed in 
page zero to effect the opcode action on the 
address input to RINXY in XY. 


OUTPUT 


:ERRORS 


>REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


XY contains 16-bit address (high order in X). 
The ist byte opcode of a 3-byte instruction must 
follow "JSR RINXY" in the calling program. 

M1 to M4 contains a subroutine built from the 
opcode followed by the contents of XY, followed 
by $60 (RTS). Control is passed to M1. 

A stacking error will occur if the opcode is $4C 
or $6C - JMP $hilo or JMP ($hilo). 

X Y 

2 

M1 to M4 

41 

Minimum 71. 


:CLASS 2 


tm kkKKR 


7 
_— 
” 
Cc 
ow 
ou 


RINXY PHP 


PHA 
LDA 
STA 
STX 


TSX 


PAGE INX 


INC 
BEQ 


TSX 


STA 


-discreet *interruptable *promable 
xreentrant *relocatable -robust 


M1 -Start of 4-byte subroutine location. 
$60 :Opcode for RTS instruction. 

:Save flags 08 

:and accumulator. 48 
#OPRTS :Store RTS at end of page AY 60 
RISUB+3 :zero subroutine, after 85 M4 


RISUB+2 shigh order byte of address. 86 M3 


-Index stack, bump stacked BA 


:Program Counter (return E8 
$0102,X saddress - 1) past 1-byte FE O02 01 
PAGE opcode following JSR RINXY. FO FA 

sIndex stack and BA 
$0103 ,X :copy new return address BD 03 01 
RISUB :lo-byte to page zero. 85 M1 
$0104,X “then return address hi-byte BD 04 01 
RISUB+1 :to follow it. 85 M2 
#0 Index program and copy the A2 00 
(RISUB,X) :opcode to start of page zero A‘ M1 
RISUB ‘subroutine then insert low 85 M1 


RISUB+1 order byte of input address. 84 M2 


RISUBt+2 “Restore X from subroutine. A6é M3 
-Restore A and P from stack 68 
‘and exit by jump to single 28 

RISUB ‘instruction subroutine. 4C M1 00 


REGISTER INDIRECT 


= XYMODS Copy instruction with modified address operand. 
7J0B To provide a Register Indirect addressing mode 
by modifying the address operand of a single 
instruction following the call to XYMODS, copied 
: to a subroutine, with a register input address. 
:ACTION Move return address from stack to subroutine 
: bytes 5 and 6. 
Use return address as pointer to opcode byte. 
Copy opcode byte to byte 1 of single-instruction 
subroutine. 
Store 16-bit register contents to bytes 2 and 3 
as subroutine address operand. 
Increment return address to act as jump address 
back to program after the instruction. 
Store JUMP opcode to byte 4 of subroutine. 
Jump to subroutine. 
>CPU 6502 
>HARDWARE None. 
> SOFTWARE "XYSUB" - a 4-byte subroutine constructed in 
: page zero to effect the opcode action on the 
address input to XYMODS in XY. 

INPUT XY contains 16-bit address (high order in X). 
The 3-byte instruction (with dummy address 
operand) must immediately follow "JSR XYMODS" in 

: the calling program. 
:OUTPUT M1 to M6 contains a subroutine built from the 
: opcode (instruction Ist byte) followed by the 
contents of XY as address operand and terminated 
by a JMP built from the return address. Control 
is passed to M1. 
: MO and P are changed. 
:ERRORS None. 
>REG USE X Y P 
>STACK USE None. 
RAM USE MO to M6 
> LENGTH 40 
:CYCLES Minimum 60. 
CLASS 2 -discreet *interruptable *promable 
TT RKKKK xreentrant *relocatable *robust 
XYSUB = M1 :Start address of 6-byte subroutine. 
OPJMP = $4C >Opcode for JMP instruction. 
XYMODS STA MO Save A not blocking stack. 85 MO 
PLA :pull return address 68 
STA XYSUB+4 sand store in page zero 85 M5 
PLA :Subroutine in bytes 68 
STA XYSUB+5 >5 and 6. 85 M6 
STY XYSUB+1 :Store address operand in 84 M2 
STX XYSUB+2 :bytes 2 and 3. 86 M3 
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LDY 
LDA 
STA 


LDA 
CLC 
ADC 
BCC 


JAOK STA 


#1 “Index program and copy AO 01 
(XYSUB+4),Y :opcode to byte 1 of B1 M5 
XYSUB page zero subroutine. 85 M1 
XYSUB+4 -Increment return address A5 M5 
spast 3-byte instruction 18 
#4 sadding 1 to convert RTS 69 04 
JAOK saddress to correct JMP 90 02 
XYSUB+5 saddress for return. E6 M6 
XYSUB+4 -Store JMP opcode before 85 M5 
#0PJMP -JMP address so subroutine A9 4C 
XYSUB+3 sends with a jump. 85 M4 
MO -Restore A and Y from A5 M0 
XYSUB+1 spage zero and exit by A4 M2 
XYSUB -jump to subroutine. 4¢ M1 00 


INTELLIGENT REGISTER INDIRECT 


INDXY by Cormac Duffin acts like RINXY to build a 4-byte 
subroutine in page zero but is intelligent enough to test for the ‘IMP’ 
opcodes, $4C and $6C. If the instruction is a jump then INDXY 
removes the return address from stack to prevent the stacking error 


that would occur. 


:ACTION 


CPU 
sHARDWARE 
sSOFTWARE 


OUTPUT 
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To provide a Register Indirect addressing mode 
by building up a single instruction subroutine 
from an opcode following the call to INDXY and 
a register input address. 

Increment return address past opcode byte. 
Store 16-bit register contents to bytes 2 and 3 
as subroutine address operand. 

Store RETURN opcode to Last byte of subroutine. 
Use return address as pointer to opcode byte. 
Copy opcode byte to byte 1 of subroutine. 

IF opcode is a JUMP THEN: 

{C Remove return address from stack. ] 

Jump to subroutine. 


None. 

"INSUB" - a 4-byte subroutine constructed in 
page zero to effect the opcode action on the 
address input to INDXY in XY. 


XY contains 16-bit address (high order in X). 
The 1st byte opcode of a 3-byte instruction must 
follow "JSR INDXY" in the calling program. 

M1 to M4 contains a subroutine built from the 
opcode followed by the contents of XY, followed 
by $60 (RTS). Control is passed to M1. 

M5 to M8 and Y are changed. 


> ERRORS 
:REG USE 
>STACK USE 
>RAM USE 
> LENGTH 
CYCLES 


*interruptable *promable 
*relocatable *robust 


te kkKKEK 


i) 

~~ 

-a 

— 

”n 
fom ou oa Ft 


INDXY STA 
PHP 
PLA 
STA 


SEC 
PLA 
ADC 
STA 
PLA 
ADC 
STA 
PHA 
LDA 
PHA 


STX 
STY 
LDA 
STA 
LDY 
LDA 
STA 


AND 


PLA 


RESTOR LDA 


None. 

X Y 

1 

M1 to M8 

52 

Minimum 94, 


-discreet 
kreentrant 


$DF 


TEMPZ 


TEMPZ+1 


#0 
TEMPZ+2 


#0 
TEMPZ+3 


TEMPZ+2 


INSUB+2 
INSUB+1 
#OPRTS 

INSUB+3 


#0 
(TEMPZ+2),Y 
INSUB 


#JMPMSK 


#0PJMP 
RESTOR 


TEMPZ+1 


REGISTER INDIRECT 


:Save A and P in 

:page zero temporary 
:store so return address 
>on stack is not blocked. 


:Prepare to add. 

:Pull return address off 
>stack and store it in 
>page zero, after 
incrementing it past 
:opcode byte, as pointer 
:to opcode byte. Then put 
zadjusted return address 
:back to stack for exit 
>from INSUB if not JMP. 


:>Store address operand 
ato bytes 2 & 3 of INSUB 
sand terminate it with 
:code $60 for RTS exit. 


:Index program and copy 
:opcode to byte 1 of 
:page zero subroutine. 


:Test opcode for absolute 
:or indirect jump, skip 
:if it isn't, else clear 
:return address off stack 
>to prevent stack error. 


:Restore P 

:via stack, 

:restoring A first so 
:flags not changed and 
exit to subroutine. 


Start address of 4-byte subroutine. 
>ist of 4-byte temporary storage. 
>Opcode for RTS instruction. 
>Opcode for JMP instruction. 
:Bitmask to reduce $6C to $4C. 


M5 


M6 


00 
M7 


00 
M8 


M7 
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FINAL THOUGHTS 


INDXY is 52 bytes and 94 cycles, RINXY is 41 bytes and 71 cycles and 
XYMODS is 40 bytes and 60 cycles. David Heale’s XYMOD stands its 
ground at only 29 bytes and 61 cycles. 


The concepts behind these routines are certainly thought provoking 
— subroutines that rewrite the program calling them or build new 
subroutines from a dissection of the program — but is the subject 
worthy? 


Leaving out ‘CPX, CPY, LDX, LDY, STX and STY’ which are 
very improbable uses of an (XY) operand, there are only 17 
instructions which may benefit from this extra addressing mode. 
With a little thought, 8 of them can be programmed in only 4 bytes 
each — the number of bytes taken by ‘JSR’ plus the single-byte 
opcode — using only 2 page zero locations. 


The method is simple: initialise and keep MO to a value of 0 then use 
the 2-instruction sequence ‘STX M1; ope (MO),Y’. This will 
produce the effect of ‘opc (XY) for ‘ADC, AND, CMP, EOR, 
LDA, ORA, SBC and STA’ with a maximum execution time of 9 
clock cycles. 


The remaining instructions — ‘BIT, ASL, ROL, LSR, ROR, DEC, 
INC, JSR and JMP’ - do not support post-indexed indirect 
addressing. It may possibly be worthwhile to implement this register 
indirect form for the shift instructions alone but how often do you 
use X and Y to hold a 16-bit address? If you need to store the address 
in page zero in order to use it then it might not be a bad idea to have 
it there from the start, leaving X and Y free for their normal function 
as loop counters and index registers. 
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Parameter Access, 


Save and Restore 


The design of the 6502 can present something of a problem when 
you attempt to write general purpose library routines to Sub Set 
Class 1 standards. Class 1 routines must be ‘discreet’ — they 
should not change any parameters except to pass useful informa- 
tion back from the routine — and so the contents of registers and 
memory locations used within a subroutine have to be preserved 
in some way. Since the 6502 allows only P and A to be pushed on 
to the page 1 stack, any other values have first to be transferred or 
loaded into A before they can be pushed. 





If, for example, you need to save all registers and 2 of the Page zero 
pseudo-registers then the byte overhead on your subroutine is a 
massive 25 and your source program 21 lines long even before you 
include the actual processing, | 


SAVE PHP :Save flags, 08 
PHA :Accumulator, 48 
TXA :Index X via A, 8A 
PHA : 48 
TYA :Index Y via A, 98 
PHA : 48 
LDA MO :pseudo-register 0 via A A5 MO 
PHA : 48 
LDA M1 sand pseudo-register 1 via A. A5 M1 
PHA : 48 

PROCES . :Do . 
. sactual 
. :processing. . 

RESTOR PLA :Restore pseudo-register 1 via A, 68 
STA M1 : 85 M1 
PLA :pseudo-register 0 via A, 68 
STA M0 : 85 M0 
PLA :Index Y via A, 68 
TAY : A8 
PLA :Index X via A, 68 
TAX : AA 
PLA :Accumulator, 68 
PLP z:and flags. 28 
RTS :Exit subroutine. 60 


Each additional page zero location pushed adds 6 bytes to the object 
code and 4 lines to the source program. 
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PARAMETER ACCESS, SAVE AND RESTORE 


There are other problems. The Accumulator is changed by passing 
the different values through it. If you have used A to send a value to 
the subroutine, that value is embedded deep in the stack and can 
only be obtained by using X to index the stack, adding another 4 
bytes to the routine and changing X: 


GETA TSX ‘Index stack and recover A BA 
LDA $0105,X :from above M1, M0, Y and X. BD 05 01 


Passing values to a subroutine by embedding them in the program 
immediately after the ‘JSR’ instruction is also awkward. Again X has 
to index the stack but in this case the return address, above P, has to 
be moved into page zero before it can be used as the base address of 
the parameter list. 


GETPC LDA M2 -Save pseudo-register 2 A5 M2 
PHA son stack via A, 48 
LDA M3 ‘then pseudo-register 3 A5 M3 
PHA :also via A. 48 
TSX -Index stack, get ret. addr. BA 
LDA $0109,X :lo-byte from above M3, M2, BD 09 01 
STA M2 -M1, MO, Y, X, A and P into 85 M2 
LDA $010A,X :M2 then the high order BD OA 01 
STA M3 byte into M3, both via A. 85 M3 


After the parameters have been accessed — usually indexed by Y with 
post-indexed indirect addressing — the number of parameter bytes 
‘has to be added to the return address in M2 and M3, the new address 
moved back to stack and M2 and M3 restored. 


In a large, structured program, subroutines will be nested to many 
levels. If a large number of those subroutines need to save registers 
and values from the pseudo-register block, as well as access 
embedded parameters, the fixed page stack of the 6502 at only 256 
bytes is soon going to be used up. . 


The routines in this chapter attempt to solve these problems 
associated with conserving values and accessing the parameters 
passed to a routine. Probably they will not meet your exact needs but 
what they can do is reduce the opening and closing sequences from 
many bytes to just one or two 3-byte subroutine calls. One thing that 
must be said, however, is that the processes are comparatively slow; 
if you need fast code then the subroutines are unlikely to be of any 
help. 
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ACCESS 


BOX and COX are a couple of my routines designed to provide 
complete and easy access to all register and flag values and addresses 
of the routine and its parameters. In the original Sub Set version they 
were combined as a single routine with 2 entry points. They operate 
slightly quicker when separated. 


BOX first pushes P, A, X and Y to stack. If it has been called as the 
first instruction of a routine then the top 8 stack bytes now contain 
the registers followed by the return address to the routine (i.e. the 
address of the third byte of that routine) and the return address to 


the program which called the routine where parameters may be 
embedded. 


The second action of BOX is to exchange the eight page zero 
pseudo-registers, M8 to MF, with the top 8 bytes of stack. The page 
zero values are thus saved and the stacked values available for 
pre-indexed and post-indexed indirect addressing. | 


COX is the inverse process and should be called at the end of a routine 
which opens with JSR BOX. It restores the eight page zero values, 
putting the registers and addresses back on stack — after changing 
the values in MC and MD (presumed to be the address of the third 
byte of JSR BOX) to the return address (third byte of JSR COX). 
Then all registers are restored and COX returns to the instruction 
after JSR COX with the return address from the routine on top of 
stack ready for ‘RTS’. 


:J0B To push registers to stack and then exchange the 
block of stack containing them with a block of 
: readily accessible memory. 
ACTION Push all registers. 
: Index stack and memory blocks. 
FOR block Length: 
C Exchange stack with memory. ] 
Copy return address to top of stack. 
Return. 
= CPU 6502 
>HARDWARE None. 
> SOFTWARE None. 
: INPUT None.-> 
:QUTPUT M8 to MF on stack. 
: Input Y in M8. Input X in M9. 
Input A in MA. Input P in MB. 
Return address (3rd byte JSR BOX) in MC,D. 
Top 2-bytes of stack on entry in ME,F. 
X = 0. Y indexes MF on stack. A and P changed. 
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>ERRORS None. 
:REG USE PAX Y 
“STACK USE 6 

>RAM USE M8 to MF 


*LENGTH 34 
CYCLES 289 
:CLASS 2 -discreet *interruptable *promable 
Tr kaRKKE kxreentrant *relocatable *robust 
STACK = $0100 -6502 page 1 stack base. 
PZERO = Zz :Assemble as address above page zero 
:pseudo-register MF as block index. 
BOX PHP -Push registers 08 
PHA :P, A, X and Y 48 
TXA zonto stack 8A 
PHA :in preperation 48 
TYA :for exchange 98 
PHA swith page zero block. 48 
TSX :Index stack by Y at one BA 
TXA :location lower in memory 8A 
TAY :than stacked Y. A8 
LDX #-8 :Index page zero from M8. A2 F8 
BOXL INY :Index next stack byte. C8 
LDA STACK,Y -Get next byte from stack, B9 00 01 
PHA :save it and move next 48 
LDA PZERO,X :byte from page zero block B5 z2 
STA STACK,Y -to corresponding stack. 99 00 01 
PLA :Recover stack byte, move to 68 
STA PZERO,X :corresponding pz byte. 95 22 
INX -Index next pz block byte E8 
BNE BOXL :and repeat for M8 to MF. DO FO 
LDA MD :Copy exchanged return AS MD 
PHA saddress from pz back 48 
LDA MC :to top of stack A5 MC 
PHA sto enable RTS exit 48 
RTS | :from BOX. 60 
= COX Block exchange memory/stack, unstack registers. 
:J0B To exchange a block of readily accessible memory 


with a block of stack and then pull registers 
: from that block. 
>ACTION Move return address to correct memory location. 
: Index stack and memory blocks. 
FOR block length: 
C Exchange stack with memory. J 
Pull all registers. 
Return. 
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> CPU 
>HARDWARE 
>SOFTWARE 
> INPUT 
:OUTPUT 
>ERRORS 
:REG USE 
:STACK USE 
>RAM USE 
:LENGTH 
:CYCLES 
CLASS 2 
t= KKKKK 
STACK = 
PZERO = 
COX PLA 
STA 
PLA 
STA 
TSX 
TXA 
TAY 
LDX 
COXL INY 
LDA 
PHA 
LDA 
STA 
PLA 
STA 
INX 
BNE 
PLA 
TAY 
PLA 
TAX 
PLA 
PLP 
RTS 


Input top 8 bytes of stack in M8 to MF. 


Input M8 in Y. Input M9 in X. 
Input MA in A. Input MB in P. 
Input contents of MC,D are lost. 


Input ME,F 


PAX Y 


-discreet 
kxreentrant 


MD 


#-8 


PZERO,X 


STACK,Y 
PZERO,X 


STACK,Y 


COXL 


on top 2-bytes of stack. 


xinterruptable *promable 


krelocatable *robust 


:6502 page 1 stack base. 
sAssemble as address above page zero 
spseudo-register MF as block index. 


*Move return address to 

:page zero where exchange 
swill put it on stack top 
after all registers pulled. 


:Index stack by Y at one 
:lLocation lower in memory 
:than start of block. 
:Index page zero from M8. 


:Index next stack byte. 
>Get next pz block byte, 
:save it and move next 
sbyte from stack to 
-corresponding pz block. 
:Recover pz byte, move to 
:corresponding stack byte. 
-Index next pz block byte 
:and repeat for M8 to MF. 


:Restore previous 
-contents of 

7M8 to MB 

:to registers 

-Y, X, A and P, and 
sexit to correct 
zreturn address. 


68 
85 
68 
85 


BA 
BA 
A8 
A2 


c8 
B5 
48 


MC 


MD 


F8 


Z2 


00 01 
zz 


00 01 


FO 
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USER STACK 


PSHU and PULU by Martin Ford provide you with the advanced 
stacking facilities of the 6809. With a single call you can save or 
restore any combination of the PC, P, A, X, Y and 16-bit 
pseudo-registers, M0,1, M2,3 and M4,5 to a stack anywhere in 
memory. They appeared in Sub Set as PSH16 and PLL16. 


The parameter byte which determines the registers to be pushed or 
pulled is embedded after the ‘JSR’ instruction. Each bit stands for 
one of the eight registers and if set that register is pushed (PSHU) or 
pulled (PULU). If the bit is reset no push or pull of the corresponding 
register takes place. 


The order in which the registers are dealt with cannot be changed. 
The Program Counter is always pushed first, followed by P, A, X, 
Y, (M5,M4), (M3,M2) and finally, (M1,M0). They are pulled in 


reverse order. 


The User Stack Pointer occupies 2 bytes of page zero, preferably not 
within the 16-byte pseudo-register block. Any of the values on the 
User Stack can be easily accessed by post-indexed indirect addres- 
sing: set Y equal to the offset of the required byte and execute the 
instruction ‘LDA (USPL),Y’. The top of Ustack has a zero offset 
since the User Stack Pointer addresses the last byte pushed (as in the 
6809) and not the next available location as the 6502 hardware stack 
pointer does. | 


:J0B To store all the register and pseudo-register 
: memory values, specified by a byte following the 
call to PSHU, to an area of memory dedicated as 
: "User Stack", indexed by a "User Stack Pointer". 
:ACTION Save registers and pseudo-registers to system 
: stack in the correct order. 
Bump return address past push parameter byte. 
Use return address to get parameter byte. 
Set register-size code byte. 
REPEAT 8 times: ; 
C IF current parameter byte bit set THEN: 
C Copy byte from stack to User stack. 
Index. next free User stack location. 
IF current register-size code bit set THEN: 
{C Copy byte from stack to User stack. 
Index next free User stack location. ] ] 
Index next stack byte. 
IF current register-size code bit set THEN: 
C Index next stack byte. J J 
Restore pseudo-registers and registers from 
system stack in correct order. 


PARAMETER ACCESS, SAVE AND RESTORE 


:CPU 
sHARDWARE 


>SOFTWARE 


>QOUTPUT 


>ERRORS 
REG USE 
>STACK USE 
RAM USE 


> LENGTH 
CYCLES 


:CLASS 1 
kKKKKK 


Cc 
” 
70: 
= 

ee 


PSHU PHP 
PHA 
TXA 
PHA 
TYA 
PHA 
TSX 


LDY 
PSH1 LDA 
PHA 
DEY 
BNE 


INC 
LDA 
STA 
BNE 
INC 
PSH2 LDA 
STA 


6502 

RAM dedicated as "User Stack", not limited by 
any page boundary restrictions. 

None. 


"USP" addresses current top-of-Ustack, one byte 
above next free Ustack location. 

Set bits in the parameter byte immediately after 
"JSR PSHU" specify which registers and page zero 
words are to be pushed: 


Bit 0: M0,1 Bit 4: X 
Bit 1: M2,3 Bit 5: A 
Bit 2: M4,5 Bit 6: P 
Bit 3: Y Bit 7: PC 


"USP" addresses last byte pushed to Ustack. 

PSHU returns to byte after parameter byte. 

None. 

None. 

10 

"USP" - 2-byte "User Stack Pointer" in page zero 
(not within the 16-byte pseudo-register block). 
107 

Not given. 


xdiscreet *interruptable *promable 
xreentrant *relocatable *robust 


qq :Address of stored User Stack Pointer 
pp zhigh and low bytes in page zero. 
22 :Assemble as M0-1 to act as index base 


:for addressing pseudo-register block. 


:Push registers 08 
:P, A, X and Y 48 
sonto 6502 page 1 8A 
:system stack for easy 48 
saccess and free for 98 
zuse in PSHU. 48 
:Index Stack below Y. BA 
#6 sIndex count 6 pz bytes. AO 06 
ZIB,Y :Push all page zero B9 zz 00 
:pseudo-registers MO to M5 48 
| swith MO on stack top. 88 
PSH1 sLeave Y = Q. DO FY 
$0105,X :Increment low order byte FE 05 01 
$0105,X :of return address (and BD 05 01 
MO salso high order byte if 85 M0 
PSH2 snecessary) and copy DO 03 
$0106,X zit to page zero so that FE 06 01 
$0106,X sit can be used to index BD 06 01 
M1 zand get parameter byte. 85 M1 


33 


PARAMETER ACCESS, SAVE AND RESTORE 


LDA #$87 :Register-size code, 1 byte A9 87 


STA M2 :(reset) or 2 bytes (set). 85 M2 
LDA (M0),Y :Copy parameter byte to B1 M0 
STA M3 - spage zero for testing. 85 M3 
LDA #8 :Set count for 8 bits A9 08 
STA M4 :in parameter byte. 85 M4 
DEY >Set Y = 255 and (USPL),Y to 88 
DEC USPH sindex next free Ustack byte. C6 pp 
:...Push loop, 8 iterations, 1 or 2 bytes pushed each Loop. 
PSH3 - ROL M3 ‘Shift next parameter bit to 26 M3 
BCC PSH4 scarry, skip if no push. 90 10 
LDA $0106,X :Copy one byte from stack BD 06 01 
STA (USPL),Y :to Ustack and index next 91 aq 
DEY sbyte on Ustack. 88 
BIT Me :Test push size of current 24 M2 
BPL PSH4 :"register", skip if 1 byte. 10 06 
LDA $0105,X zElse copy next byte from BD 05 01 
STA (CUSPL),Y istack to Ustack and index 91 qq 
DEY “next byte on Ustack. 88 
PSH4 DEX :Move stack index down one CA 
ROL M2 :"register", test reg-size 26 M2 
BCC PSH5 zand move down by 2 bytes 90 01 
DEX -if PC, MO,1, M2,3 or M4,5. CA 
PSH5 DEC M4 :Repeat for all 8 bits in C6 M4 
BNE PSH3 sparameter byte. DO E2 
SEC :Compute new top-of-Ustack 38 
TYA -address, 98 
ADC USPL :USPH,L = USPH,L + Y + 1, 65 aq 
STA USPL :(since Y + USPH,L indexes 85 aq 
BCC PSH6 snext free byte) leaving 90 02 
INC USPH >USP addressing last push. E6 pp 
PSH6 LDY #0 :Index block of 6 pz bytes. AO 00 
PSH? ~=—oINY :Index next byte and C8 
PLA zrestore from 6502 page 1 68 
STA ZIB,Y :stack to correct page 0 99 zz 00 
CPY #6 slocation, repeating for CO 06 
BNE PSH? :M0 to M6. DO F7 
PLA :Restore 68 
TAY zall A8 
PLA zregisters 68 
TAX sand flags from AA 
PLA 76502 page 1 stack and 68 
PLP zreturn to program location 28 
RTS safter parameter byte. 60 
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PARAMETER ACCESS, SAVE AND RESTORE 


ACTION 


:CPU 
sHARDWARE 


>SOFTWARE 


OUTPUT 


ERRORS 
sREG USE 
:STACK USE 
:RAM USE 


> LENGTH 
CYCLES 


:CLASS 1 
keKKKE 


To restore all the register and pseudo-register 
memory values, specified by a byte following the 
call to PULU, from a memory area dedicated as 
"User Stack", indexed by a "User Stack Pointer". 
Save registers and pseudo-registers to system 
stack in the correct order. 
Bump return address past pull parameter byte. 
Use return address to get parameter byte. 
Set register-size code byte. 
REPEAT 8 times: 
C IF current parameter byte bit set THEN: 
{C Copy byte from User stack to stack. 
Index next User stack location. 
IF current register-size code bit set THEN: 
C Copy byte from User stack to stack. 
Index next User stack location. J] ] 
Index next stack byte. 
IF current register-size code bit set THEN: 
C Index next stack byte. J] ] 
Restore pseudo-registers and registers from 
system stack in correct order. 


6502 

RAM dedicated as "User Stack", not Limited by 
any page boundary restrictions. 

None. 


"USP" addresses current top-of-Ustack, the next 
byte to be pulled. 

Set bits in the parameter byte immediately after 
"JSR PULU" specify which registers and page zero 
words are to be pulled: 


Bit 0: M0,1 Bit 4: X 
Bit 1: M2,3 Bit 5: A 
Bit 2: M4,5 Bit 6: P 
Bit 3: Y Bit 7: PC 


"USP" addresses current top-of-Ustack, one byte 
higher in memory than last byte pulled. 

PULU returns to byte after parameter byte if the 
Program Counter has not been pulled, else to the 
byte following the address pulled to the PC. 
None. 

All registers that are restored from Ustack. 

10 

"USP" - 2-byte "User Stack Pointer" in page zero 
(not within the 16-byte pseudo-register block). 
Any of M0,1, M2,3 and M4,5 restored from Ustack. 
104 

Not given. 
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USPL 
USPH 
ZIB 


PULU 


PUL1 


PUL2 


PUL3 


PUL4 


PUL5 


00 


01 
01 


01 
01 


01 


01 


= qq :Address of stored User Stack Pointer 
= pp shigh and low bytes in page zero. 
= 22 :Assemble as MO-1 to act as index base 
:for addressing pseudo-register block. 
PHP :Push registers 08 
PHA :P, A, X and Y 48 
TXA zonto 6502 page 1 8A 
PHA :system stack for easy 48 
TYA saccess and free for 98 
PHA suse in PULU. 48 
LDY #6 :Index count 6 pz bytes. AO 06 
LDA ZIB,Y :Push all page zero B9 22 
PHA spseudo-registers MO to M5 48 
DEY swith MO on stack top. 88 
BNE PUL1 sLeave Y = 0. DO F9 
TSX :Index Stack below MO. BA 
INC $010B,X :Increment low order byte FE OB 
LDA $010B,X :of return address (and BD OB 
STA MO z:also high order byte if 85 M0 
BNE PUL2 snecessary) and copy DO 03 
INC $010C,X ‘it to page zero so that FE OC 
LDA $010C,X zit can be used to index BD OC 
STA M1 sand get parameter byte. 85 M1 
LDA #$E1 :Register-size code, 1 byte A9 E1 
STA M2 :(reset) or 2 bytes (set). 85 M2 
LDA (M0),Y :Copy parameter byte to B1 MO 
STA M3 spage zero for testing. 85 M3 
LDA #8 :Set count for 8 bits A9 08 
STA M4 :in parameter byte. 85 M4 
:...Pull loop, 8 iterations, 1 or 2 bytes pulled each loop. 
ROR M3 :Shift next parameter bit to 66 M3 
BCC PUL4 :carry, skip if no pull. 90 10 
LDA (USPL),Y :Copy one byte from Ustack B1 aq 
STA $0101,X :to stack and index next 9D 06 
INY zbyte on Ustack. C8 
BIT M2 :Test pull size of current 24 M2 
BPL PUL4 :"register", skip if 1 byte. 10 06 
LDA (CUSPL),Y :Else copy next byte from B1 aq 
STA $0102,X sUstack to stack and index 9D 02 
INY snext byte on Ustack. C8 
INX :Move stack index up one E8 
ROL M2 :"register", test reg-size 26 M2 
BCC PULS5 sand move up by 2 bytes 90 01 
INX :if PC, MO,1, M2,3 or M4,5. E8 
DEC M4 :Repeat for all 8 bits in C6 M4 
BNE PUL3 :parameter byte. DO E2 


PARAMETER ACCESS, SAVE AND RESTORE 


CLC 
TYA 
ADC 
STA 
BCC 
INC 


PUL6 LDY 
PUL? ~ INY 


BNE 


USPL 
USPL 
PUL6 
USPH 


#0 


Z1B,Y 


PUL? 


:Compute new top-of-Ustack 
saddress, 

:USPH,L = USPH,L + Y, 

:(Y is number of bytes 
spulled) leaving USP 
saddressing last pull + 1. 


>Index block of 6 pz bytes. 
sIndex next byte and 
:restore from 6502 page 1 
:stack to correct page 0 
:location, repeating for 
>M0 to M6. 


:Restore all 

zregisters and flags 

:from 6502 page 1 stack and 
:return either to program 
:location after parameter 
sbyte (PC not pulled) or 
>to new PC + 1. 


qq 


qq 
02 


pp 


ZZ 


F/ 


00 
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Program Relative 
Addressing 






The routines in this chapter all use the current contents of the 
Program Counter — the pointer register used by the CPU to keep 
track of program flow. 


Some instruction sets, notably that of the 6809, support various 
program relative addressing modes. Jumps, subroutine calls, 
loads, stores, shifts and indeed all program control and memory 
operations can be performed relative to the program position of 
the instruction. Consequently, any 6809 program can be position- 
independent and the object code will operate correctly at any 
location. 


The only form of program relative addressing available on the 6502 
are the eight conditional Branch instructions. To operate on memory 
you need to specify the actual location, either as an absolute direct 
address or indirectly by reference to two consecutive page zero bytes 
where the address is stored. Direct and indirect addresses can also be 
indexed in various ways using the X and Y registers for many 6502 
operations but not for jumps or subroutine calls. 


DYNAMIC ADDRESSING 


If you can gain access to the address held in the Program Counter 
then you can use it dynamically to address any part of the program, 
or its associated data, as an offset from the current instruction. 
However, the only way to access the contents of the 6502’s Program 
Counter is to take it from the stack inside a subroutine — which must 
be addressed directly. The first 6502 Sub Set routine to do this was 
FIND by P. Nowasad. 


FIND, rather strangely, pulls the return address into Y (high order 
byte) and X, increments YX and then returns, not to the routine 
which called it but to the routine or program which called that one. 
(The increment is needed to form the true return address; the 6502 
stacks the address of the third byte of the ‘JSR’ instruction). 
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PROGRAM RELATIVE ADDRESSING 


This sample use of FIND was given in Sub Set (WRITE is a routine to 
print the message pointed to by YX): 


Address Label Instruction Comment 

0000 JSR BLOCK Call to get message address. 
0003 JSR WRITE :Call to print message at YX. 
0060 BLOCK JSR FIND :Get message address to YX, 
0063 ~BYTE 7 sreturning to "JSR WRITE" on 
0064 ~TEXT "Message" :exit from FIND. | 


To the best of my knowledge, no Sub Set reader wrote in to ask why, 
since the absolute, non-dynamic, address of BLOCK must have been 
programmed into JSR BLOCK, a routine such as F IND was necessary. 
After all, the entire operation could be simplified to: 


0000 LDY #BLOCKH -Y = BLOCK address hi-byte. 
0002 LDX #BLOCKL -X = BLOCK address lo-byte. 
0004 JSR WRITE ‘Call to print message at YX. 
0060 BLOCK .BYTE 7 >Message Length. 

0061 .TEXT "Message" :Message contents. 


The only reason that I can think of for not doing it the second way is 
that an error would occur if the object program (machine code) were 
relocated by a routine intelligent enough to change all ‘JMP’ and 
‘JSR’ address fields to the new position but too dumb to recognise 
that the immediate data in Y and X is an address. 


:J0B To return control to a program one level higher 
: than that which called FIND, with the address of 
the FIND-calling program in registers. 


sACTION Pull return address to registers. 
: Return 
CPU 6502 


sHARDWARE None. 

:SOFTWARE None. 

> INPUT The program needing the address of the memory 

: block must have called that block; the block 

: must then have immediately called FIND. 

OUTPUT YX addresses the location following "JSR FIND". 
: P and A are changed. 

Exit is to the program which called the block 
that called FIND. 
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:ERRORS None. 

>REG USE PAX Y 
>STACK USE None (-2). 
>RAM USE None. 


> LENGTH 9 

CYCLES 23 (+1 if address high byte is incremented). 

CLASS 2 -discreet *interruptable *promable 

To RKKEKK *xreentrant *relocatable *robust 

FIND PLA :Pull return address - 1 68 
TAX :low order byte to X via A. AA 
PLA :Pull return address - 1 68 
TAY shigh order byte to Y via A. A8 
INX :Increment to address byte E8 
BNE FNDOUT simmediately after "JSR FIND", DO 01 
INY exit to program which called C8 

FNDOUT RTS sblock with block address in YX. 60 

FIND OUT WHERE I’M AT 


FIND is actually a variation on the basic concept of a routine which 
ouputs, in registers or memory, the address it returns to. Once the 
address has been found, it can be used dynamically to make all 
further addressing program relative. The original idea in Sub Set was 
a 3-byte Z-80 routine: 


FOWIA POP HL :Pull (pop in 2-80) return address to E1 
PUSH HL :16-bit register HL. Save it again and €5 
RET :return to instruction addressed by HL. (C9 


FIND can easily be adapted to perform the same operation as FOWIA 
by inserting three instructions at the end of the first section (after 
“TAY’): 


PHA :Save return address back to stack 48 
TXA sO exit is to 1st byte of instruction 8A 
PHA :following "JSR FIND", addressed by YX. 48 


A more useful approach is that of GETLOC by Vincent Fojut. 2 bytes 
longer than the adapted FIND but 6 or 7 cycles quicker, GETLOC 
stores the true return address in page zero where it can be used with 
various indirect addressing modes. 


:J0B To return the current Program Counter contents 
: of the calling program. 
:ACTION Pull return address to pseudo-register memory. 


Jump to address in memory. 
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:CPU 6502 

sHARDWARE None. 

:SOFTWARE None. 

> INPUT None. 

“OUTPUT | MO,1 addresses the location immediately 
: following "JSR GETLOC". 

: P and A are changed. 

sERRORS Arithmetic error if D = 1 (decimal mode). 
sREG USE P A 

-STACK USE None. 

>RAM USE MO M1 


: LENGTH 14 

: CYCLES 25 

“CLASS 2 _-discreet *interruptable *promable 

Sa REKK- kreentrant *relocatable -robust 

GETLOC CLC Clear for no-carry addition. 18 
PLA -Pull return address - 1 low 68 
ADC #1 ‘order byte, incrementing for 69 01 
STA MO -true return address, into MO. 85 M0 
PLA -Pull return address - 1 high 68 
ADC #0 ‘order byte, adding any carry 69 00 
STA M1 -from lo-byte increment, to M1. 85 M1 
JMP (MO) :Jump to true return address. 6c MO 00 


PROGRAM RELATIVE SUBROUTINE CALLS 


A subroutine which pulls the return address from stack can, of 
course, use it to address the program which called it and pick up any 
data or parameters embedded in the program. 


This is how RLTVL by Gavin Every picks up the single-byte signed 
offset written as the next byte after JSR RLTVL. The routine could 
just as well pick up a 16-bit offset allowing program relative calls to 
anywhere in memory. 


To convert RLTVL into a relative jump routine, omit the initial two 
pushes of P which clear a space for the call address and change the 
indexing of the stacked return address from ‘$0109,X’ and 
‘$010A,X’ to ‘0107,X’ and ‘$0108,X’. The section which puts the 
the computed offset address on stack does not need to be changed, it 
will now overwrite the return address. | 

Note that ‘RTI’ is not simply the equivalent of ‘PLP’ with ‘RTS’. 
The ‘Return from Subroutine’ instruction increments the address 
pulled to the Program Counter but the ‘Return from Interrupt’ does 
not. 
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7J0B To pass control to a subroutine at an address 
: formed by adding a signed offset byte (-128 to 
: +127) to the current Program Counter contents. 
:ACTION Reserve stack space for subroutine address. 
: Get return address from stack. 
Using return address as index, get offset byte. 
Add offset to return address. 
Store computed address to reserved stack space. 
"Return" to displaced subroutine. 


sHARDWARE None. 
sSOFTWARE None. 


: INPUT The signed offset byte must immediately follow 
: "JSR RLTVL" and must give the displacement of 
: the desired routine from the next Location. 
OUTPUT Exit is made to location: 

: Return address + 2 + signed offset 

>ERRORS None. 


sREG USE None. 
:STACK USE 8 
>RAM USE None. 


: LENGTH 67 
sCYCLES Minimum: 131. Maximum: 139. 
:CLASS 1 *discreet *interruptable *promable 
reKKKKE *reentrant *relocatable *robust 
PAD = MO :2-byte address storage in page zero. 
RLTVL PHP :Reserve space for computed 08 
PHP sdisplaced subroutine address. 08 
PHP :Save 08 
PHA :flags 48 
TXA sand 8A 
PHA sregisters 48 
TYA :on 98 
PHA stack. 48 
LDA PAD+1 :Save two bytes A5 M1 
PHA of page zero 48 
LDA PAD :on stack for use A5 MO 
PHA :indexing offset byte. 48 
CLD :Ensure binary addition. D8 
TSX :Index stack from below MO BA 
INC $0109,X :and increment return address FE 09 01 
BNE GETOA :to allow for offset byte, DOD 03 
INC $010A,X :addressing offset byte. FE OA 01 
GETOA LDA $0109,X :Move incremented return BD 09 01 
STA PAD address to page zero for 85 M0 
LDY $O10A,X :offset byte indexing, BC OA 01 
STY PAD+1 zleaving hi-byte in Y. 84 M1 
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LDX #0 sZero index, since (PAD) is A2 00 
LDA (PAD,X) :address of offset. Get offset A1 MO 
BPL ADDOFF :byte, if negative then adjust 10 01 


DEY scurrent PC hi-byte. 88 
ADDOFF SEC -Add offset to adjusted current 38 
ADC PAD =Program Counter (+ 1 to allow 65 MO 
BCC STSA ‘for offset byte) giving 90 01 
INY saddress of displaced routine. (C8 
STSA TSX -Index stack from below M0 BA 
STA $0107,X :and store actual address of 9D 07 01 
TYA -displaced routine in 2 stack 98 
STA $0108,X :bytes reserved for it. 9D 08 01 
PLA :Restore 68 
STA PAD :two page zero 85 MO 
PLA sbytes used for 68 
STA PAD+1 sholding offset address. 85 M1 
PLA -Restore registers 68 
TAY oY, X and A A8 
PLA by normal pulling, then 68 
TAX suse RTI to restore P AA 
PLA and effect jump to displaced 68 
RTI routine without PC increment. 40 


BIRCH is a Sub Set routine that I wrote to cut down program length 
by providing shorter relative calls. The idea is similar to the use of 
the RESTART instructions of the Z-80 which can function as 
single-byte subroutine calls. 


Both external interrupt and the software interrupt ‘BRK’ cause the 
current contents of the PC to be pushed to stack and the PC loaded 
from memory locations $FFFE and $FFFF. The two types of 
interrupt are distinguished by the state of the Break flag which is 
reset if an external interrupt has occurred but set for ‘BRK’. 


The 6502’s single-byte ‘BRK’ can be followed by any number of 
parameters. In BIRCH, just one byte is appended and if this is a zero 
then a ‘breakpoint’ is indicated. A non-zero second byte is treated as 
an offset and added to the return address forming the relative call 


address. 


:= BIRCH Break, Interrupt and Relative Call Handler. 

Pe SSSSSSSSSSS SSS SSS SS SSSSSSSS SS SSS SSS SS SS SS SSS SSS SSS SSS ESSER 
:J0B To distinguish between hardware and software 

: interrupts, doing preparatory register storage, 
and treat software interrupts with a following 
non-zero byte as a program relative subroutine 
call, the byte giving the signed offset to the 
displaced subroutine. 
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:ACTION 


= CPU 
sHARDWARE 
>SOFTWARE 


OUTPUT 


ERRORS 
REG USE 


:STACK USE 


>RAM USE 


:LENGTH 
:CYCLES 


S—kkKKE 


PROGRAM RELATIVE ADDRESSING 


IF hardware Interrupt: 
THEN: 
C Jump to Interrupt service routine. ] 
ELSE: 
{C Push registers and exchange stack/memory. 
IF offset byte = 0: 
THEN: 
C Jump to Breakpoint service routine. ] 
ELSE: 
C Subroutine Addr. = Return addr. + offset. 
Exchange stack/memory and pull registers. 
Jump to displaced subroutine. ] J] 


6502 

None. 

"BOX" - Subroutine to push registers and 
exchange stack block with page zero block. 
"COX" - Subroutine to exchange page zero block 


with stack block and pull registers. 

Vectored Break and Interrupt service routines 
are needed for the use of those functions. The 
Break routine should end "JSR COX; RTS" and the 
Interrupt routine should end "RTI". 


If BRK entry then the byte following the BRK 
instruction must be either 0 (for the Break 
function) or relative subroutine call offset 
(-128 to +127). 
B = 0 on entry: Jump to vectored Interrupt 
service routine with PC, P and A on stack. 
B = 1 on entry: 
BRK+1 = 0: Jump to vectored Break routine. 
M8 to MF on stack. 
ME,F = PC (offset addr. or ret. addr. - 1) 
MD = Stack pointer immediately before BRK. 
MC = Current Stack pointer. 
MB, MA, M9, M8 = P, A, X, Y before BRK. 
A= 0. X = S. Y = return address low byte. 
Reset: D. Set: B, I. Unknown: S, V, Z, C. 
BRK+1 > 0: Exit to displaced subroutine. 
Register and page zero values as immediately 
before BRK. 
Return address - 1 on stack top for RTS. 
None. 
Interrupt: None. Relative Call: None. 
Break: Registers changed but saved in page zero. 
(including return address on stack at entry) 
IRQ: 5. Break: 10 (including JSR BOX). 
Relative Call: 10 Cincluding JSR BOX, JMP COX). 
Interrupt: None. Relative Call: None. 
Break: M8 to MF are in use but saved on stack. 
55 
Interrupt: 19. Break: 354 minimum. 
Relative Call: 663 minimum. 


-discreet *interruptable *promable 
*reentrant relocatable *robust 


PROGRAM RELATIVE ADDRESSING 


IRQVEC 


= $hilo -Address of location holding address 
-of interrupt service routine. | 
BRKVEC = $hilo ‘Address of location holding address 
‘of Breakpoint service routine. 
BIRCH PHA >Save A and 48 
PHP smove status 08 
PLA :to A for test of 68 
AND #$10 :Break flag. 29 10 
BNE OSIER ‘Skip if set (BRK), else DO 03 
JMP (CIRQVEC) :jump to interrupt service. 6C lojhi 
OSIER JSR BOX+t2 :Get regs in page zero. 20 lo hi 
CLD sClear for binary addition. D8 
STY MD -MD = Stack pointer at BRK. 84 MD 
LDA MB :Get stored status A5 MB 
AND #$EF register P and clear 29 EF 
STA MB :Break flag, then re-store. 85 MB 
BEECH LDY ME :Decrement stored return A4 ME 
BNE ALDER saddress for RTS return and DO 02 
DEC MF :to address offset byte C6 MF 
ALDER DEC ME zat BRK + 1. C6 ME 
LDA (ME,X) -Get offset byte and skip if  A1 ME 
BPL SAVIN forward call (with X = 0) 10 01 
DEX selse X = $FF as sign extend. CA 
SAVIN BEQ’ BRIAR -Skip if offset O (Break). FO OC 
ADC ME <Compute offset address - 1: 65 ME 
TAY :add offset in X and A to A8 
TXA sreturn address in MF and ME 8A 
ADC MF sresult in Y and A. 65 MF 
CEDAR PHA -Push offset address - 1 48 
TYA :on to top of stack 98 
PHA :for RTS to displaced 48 
JMP COX ‘routine from COX. 4C lo hi 
BRIAR TSX -Move current Stack pointer BA 
STX MC sto MC to complete info in 86 MC 
JMP (BRKVEC) :page zero for Break routine. 6C lo hi 


You might find that a single-byte offset offers too small a range. 
BIRCH can be changed to operate with a 16-byte offset following the 
‘BRK’ instruction. Substitute the following 24 bytes of code for the 
21 bytes in the three sections beginning at the label BEECH and 
ending just before label CEDAR. 
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LARCH LDY #$FF :Index offset lo-byte at one AQ FF 
DEC MF :less than return address. C6 MF 
LDA (ME),Y :Get offset lo-byte and Bi ME 
INC MF srestore return address. E6 MF 
CLC :Add offset lo-byte to 18 
ADC ME :return address lo-byte 65 ME 
TAY zwith result to Y. Then get A8 
LDA (ME,X) :offset hi-byte + return address Al ME 
ADC MF shi-byte + C from lo-byte add. 65 MF 
CMP MF :Compare result with return C5 MF 
BNE CEDAR :address to test for zero offset DO 04 
CPY ME :going to breakpoint if zero but C4 ME 
BEQ BRIAR srelative call if not zero. FO 06 

BREAKPOINTS 


BOX and COX from the last chapter are used to good effect in BIRCH 
which has stored all necessary register contents in page zero when it 
exits to your breakpoint routine. Being located in page zero makes 
them readily accessible for display and the stored Program Counter 
can be used to index program memory. 


After your breakpoint routine has allowed you to review the current 
state of the machine and make any necessary adjustments to register 
contents (including the Program Counter for continuation at a 
different place), end it with JSR COX to re-stack then restore all 
registers and ‘RTS’ to get back into the program. 


LONGER BRANCHES 


LONGBR by Vincent Fojut gives to the 6502 almost the same 
branching freedom as the 6809 which can branch on simple (1 flag 
test) or complex (2 or 3 flags tested) conditions. The call to LONGBR 
must, of course, have a 16-bit offset following the ‘JSR’ instruction 
but it also needs a one-byte parameter or mask giving the jump 
conditions. 


Bit 5 of the P register does not show any status and so the 
corresponding bit in the condition mask can be used to indicate 
whether the jump is to be made on set or reset flags. The other bits 
in the mask are used to determine which flags are to be tested. 


The final test is made on the zero status of the complete set of 
selected flags. This does have the unfortunate result of resetting Z if 
ANY tested flag is set but setting Z only if ALL tested flags are 
reset. So, for example, a branch made on the combined state of the 
Zero and Carry flags would have these two asymmetric effects: 
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Z C BRANCH IF CLEAR BRANCH IF SET 
(Z AND C = Q) (Z OR C = 1) 

0 60 Branch No Branch 

0 1 No Branch Branch 

1 0 No Branch Branch 

1 1 No Branch Branch 


:J0B To test status against a condition mask and 

: branch to an offset address if conditions true. 
:ACTION Store status register. 

: Pull return address as mask and offset address. 


Get mask. Isolate true-on-set/reset bit. 
Mask status register, isolating test bits. 
IF true-on-set/reset = masked-status THEN: 
C Get offset. 
Branch address = return address + offset. ] 
Bump return/branch address past mask and offset. 
Jump to return/branch address. 
:CPU 6502 
>: HARDWARE None. 
:SOFTWARE None. 
s INPUT Condition mask byte following "JSR LONGBR". 
: 16-bit offset (low order byte first) immediately 
following condition mask byte. 
The condition mask is constructed as follows: 
Bit 5 = 0: condition true if ALL of the tested 
flags are reset. 
Bit 5 = 1: condition true if ANY of the tested 
flags are set. 
Bits 7,6,4,3,2,1,0: set to include corresponding 
status bit in test, reset to exclude the 
corresponding status bit: 


Bit 7: N Negative (sign) flag. 
Bit 6: V Overflow flag. 
Bit 4: B Break flag. 
Bit 3: D Decimal mode flag. 
Bit 2: I Interrupt disable flag. 
Bit 1: Z Zero flag. 
: Bit 0: C Carry flag. 
sOUTPUT Condition not met: MO,1 = address of location 


following 16-bit offset. 
Condition met: M0,1 = address of location at 
offset + address of location after offset. 
Exit to location (M0,1). 
: M2 = A. M3 is changed. 
sERRORS None. 
sREG USE None. 
sSTACK USE 2 
RAM USE MO to M3 
: LENGTH 81 
:CYCLES No branch: 115 to 119. Branch: 152 to 156. 
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ATS 
STATE 


LONGBR 


STA 
PHP 
PLA 
STA 


PLA 
STA 
PLA 
STA 


LDA 
PHA 
TYA 
PHA 
CLD 


LDY 
LDA 
TAY 
AND 
PHP 
TYA 


AND 
AND 
PHP 
PLA 
AND 
STA 


PLA 
AND 
EOR 
BNE 


LDY 
LDA 
PHA 
DEY 
LDA 


CLC 
ADC 
STA 
PLA 
ADC 
STA 


PROGRAM RELATIVE ADDRESSING 


~discreet 
*reentrant 


ATS 


*interruptable *promable 
xrelocatable *robust 


:Save A and P in page zero 


:so not to block stack for 
:pulling of return address. 


STATE 


:Also save status for test. 


>Pull stacked Program Counter 


PCAD 


:(return address - 1) and 


>store to page zero as code, 


PCAD+1 


STATE 


:offset and return address. 


:Now save P, copied from 


spage zero. 
:Save Y for use as index 
sand temporary storage. 
:Ensure binary arithmetic. 


#1 
(PCAD),Y 


:Index condition mask byte 
zand get in A. Save in Y 


sand check bit 5 for reset 


#$20 


sor set condition true. 


:Save test result (Z flag). 
:Recover mask to A. 


#$DF 
STATE 


sMask out reset/set bit 5. 
sIsolate status bits 


:to test and move flag 
zresults to A, masking out 


#2 
STATE 


sall but Z result and save it 
:for true/false check. 


:Recover reset/set truth. 


#2 
STATE 
INC4 


#3 
(PCAD),Y 


:Isolate Z result and compare 
swith status, skipping if 
:status not true (no match). 


:Condition met so index offset 
:getting high order byte 


>Saved on stack. 
:Index offset low order byte 


(PCAD),Y 


:and get in A. 


:Prepare to add. 


PCAD 
PCAD 


:Add offset lo-byte to 
:Program Counter lo-byte. 


:Recover offset hi-byte 


PCAD+1 
PCAD+1 


zand add to PC hi-byte. 
:PCAD = branch address - 4. 


tReturn/branch address 2-byte storage. 
-Temporary store for A. 
:Store for flags and test results. 


M2 


M3 


M0 
M1 


M3 


01 
M0 


20 


DF 
M3 


02 
M3 
02 
12 
03 
M0 
M0 
M0 
M0 


M1 
Mi 
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INC4 CLC 


RESTOR PLA 


#4 
PCAD 
PCAD 
RESTOR 
PCAD+1 


ATS 


(PCAD) 


:Prepare to add. 

sAdd 4 to return/branch 
:address to skip condition 
:mask and offset and convert 
sreturn address - 1 to true 
return address for jump. 


:Restore Y from 

*stack via A. 

:Restore A from page zero. 
:Restore P and exit by jump 
-to return or branch address. 
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Search, Sort 


and Case 





The three routines in this chapter perform quite diverse tasks. 
Their one common aspect is that they all make fundamental use 
of the computer’s ability to discriminate between data. 


KEYWORD STRING SEARCH 


Dennis May’s routine MATCH is not quite the usual string search 
routine. You wouldn’t use it to try and match a complete string of 
finite length against entries in a table but rather to test for the 
occurrence of any ‘keyword’ substrings within an object string which 
may be any length. 


As an example of how useful such a routine is, imagine the need to 
compress a large text file. The table would contain 128 of the most 
common English words and phrases and MATCH used to find all 
occurrences of them within the text. The higher level routine, which 
calls MATCH, would replace each matched substring by a single byte 
token equal to 128 more than its table position. Routines to effect 
this compression, using MATCH, and to re-expand the text are in the 
next chapter. 


There are, of course, many other uses for MATCH. A couple that 
spring to mind are searching for keyword commands in a high level 
language program (such as PRINT, GOSUB, OPEN, and so on) and as 
part of the parsing process to decode the meaning of a sentence in an 
Artificial Intelligence program. 


:J0B To test for a match between any string held in a 

: table of strings and the whole or first part of 
an object string, returning table position of 
string if match found. 
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:ACTION 


String position index = Q. 
REPEAT: 
C REPEAT: 
C Index next character position. 
Compare characters. ] 
UNTIL characters not equal. 
IF end of table string: 
THEN: C Address object string + 1. ] 
ELSE: C Address next table string. 
Increment string position index. 
Test for end of table. J] ] 
UNTIL string match OR end of table. 


> CPU 
>HARDWARE 
:SOFTWARE 


6502 
Object string and string table in memory. 
None. 


MO,1 addresses start of object string to match. 
String length: >= longest string in table. 
String terminator: none. 

M4,5 addresses start of string table. 

Table length: up to 254 strings. 

Table terminator: byte $FF. 

String length: up to 255 characters (bytes). 

Character (byte) range: 0 to $7F. 

String terminator: bit 7? set on last byte. 

= $FF: Match not found. 

M2 = X. Y = 0. A = SFF. P is changed. 

M4,5 addresses table terminator byte ($FF). 

M0,1 and M3 are unchanged. 

X < SFF: Match found. 

M2 = X. A and P are changed. 

M3 = Y = Length - 1 of matched string. 

M4,5 addresses start of matched table string. 
M0,1 addresses end + 1 of matched section of 
object string. 

Arithmetic error if D = 1 (decimal mode). 

No overflow check on either string lengths or 

number of strings in table, both will give 

erroneous output information. 


OUTPUT Xx 


ERRORS 


REG USE 


PAX Y 
:STACK USE None. 
RAM USE MO to M5 
> LENGTH 61 
:CYCLES Not given. 
:CLASS 2 -discreet *interruptable *promable 
to kkkk- xreentrant *relocatable -robust 
OBJ = MO :Stored address of object string. 
POS = M2 :For storing matched string table index. 
LEN = M3 :For storing matched string length - 1. 
TAB = M4 :Stored address of string table. 
MATCH LDY #0 -Initialise string index and AO 00 
STY POS sposition index to zero. 84 M2 
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STRLP DEY :Initially index byte 0. 88 

CHARLP INY :Index next byte of both strings. C8 
LDA (OBJ),Y :Get object byte and, ensuring Bi M0 
SEC =no borrow going in, subtract 38 
SBC (TAB),Y :current table string byte, loop F1 M4 
BEQ CHARLP :suntil bytes don't match. FO F8 
CMP #380 :If difference is only last-char C9 80 
BEQ FOUND zbit 7, then string match found. FO 1E 
DEY :Else, test from no-match byte. 88 

NXTSTR INY :Index next string byte and load C8 


LDA (TAB),Y :to test sign bit, looping until B1 M4 
BPL NXTSTR :last-char (bit 7 set) indexed. 10 FB 


TYA :Move index to A and, adding in 98 
SEC :set carry to convert index to 38 
ADC TAB :string length, add to base 65 M4 
STA TAB :address so address incremented 85 M4 
BCC ENDTST :to start of next string in 90 02 
INC TAB+1 ztable, or table terminator. E6 M5 
ENDTST INC POS :Bump string index. E6 M2 
LDY #0 :Index byte 0 of next string AO 00 
LDA (TAB),Y sand load to test for table B1 M4 
CMP #$FF :terminator, repeating if any C9 FF 
BNE STRLP :strings left to test. DO D9 
STA POS :Else string index = -1 for 85 M2 
BEQ EXIT :match not found, go exit. FO OB 
FOUND STY LEN :Store string length - 1. 84 M3 
TYA :Move last byte index to A and, 98 
ADC OBJ zadding set carry (from CMP) to 65 M0 
STA OBJ :convert index to length, add to 85 MO 
BCC EXIT :string address, addressing byte 90 02 
INC OBJ+1 :following matched substring. E6 M1 
EXIT LDX POS :Output X is string position A6 M2 
RTS sindex (match) or -1 (no match). 60 
STRING SORT 


ALSORT is another routine by Dennis May. This time the 
comparison is not to search for equality but to determine the correct 
sequence of 2 strings. 


ALSORT utilises the bubble sort which has acquired the reputation of 
inefficiency. Bubble sorts are very slow compared to other sorts such 
as the Shell sort or Hoare’s ‘Quicksort’, which has the distinction of 
being both the quickest and slowest (in its worst case) known 
method of sorting. | 
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Although slow, the bubble sort is very easy to implement and if you 
are dealing with elements of different length then it is one of only 
two really practical methods — the other being a variation known as 
the ‘ripple’ sort. All the exchanges are between two consecutive 
elements and consequently there are no problems of spacing as there 
would be in other sorts. The Quicksort, for example, could involve 
different sized elements from the start and end of the list being 
exchanged and this would require all intermediate bytes to be shifted 


by the difference between the 2 exchanged elements. 
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:ACTION 


> CPU 
>HARDWARE 


>SOFTWARE 


OUTPUT 


>ERRORS 
:REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


:CLASS 2 


To sort a RAM held list of variable 


Bubble sort. 
Length strings into ascending ASCII order. 
IF number of strings > 1 THEN: 
C REPEAT: 
C Clear pass exchange flag. 
FOR number of strings - 1: 
{C IF current string > next string THEN: 
C Move current to temporary. 
Move next to current. 
Move temporary to current 
Set pass exchange flag. ] 
UNTIL pass exchange flag clear. 


6502 

List of strings in RAM. 

Temporary RAM storage to hold longest string. 
None. 


M0,1 addresses start of string list. 
M2,3 contains number of strings in list. 
Max. number of strings: 65535. 
Max. string length: 256 including terminator. 
String terminator: $0D (carriage return). 
Character range: 0 to $FF (excluding $0D). 
List sorted in ascending numerical order. 
MO to M3 unchanged. P, A and X changed. 
M4 to MA, Y and temporary store may be changed. 
Arithmetic error if D = 1 (decimal mode). 


PAX Y 

1 

MO to MA 

161 

“Not given. 

-discreet *interruptable *promable 
-reentrant -relocatable -robust 


token 


TEMP 
LSA 
INS 

NSP 
SC 

CSP 
PX F 
CRST 


ALSORT 


ALSA 


ALSB 


ALSC 


ALSD 


ALSE 
ALSF 


ALSG 
ALSH 


ALSI 


LDX 
TXA 
ORA 
BNE 
RTS 


DEX 
TXA 
ORA 
BEQ 


LOX 
LDY 
STX 
STY 
LOX 
LDY 
STX 
STY 
LDY 
STY 


LDX 
LDA 
STX 
STA 


LDY 
INY 
LDA 
CMP 
BNE 


TYA 
SEC 
ADC 
STA 
BCC 
INC 


LDY 
INY 
LDA 
CMP 
BNE 
CMP 
BNE 
BCS 


INS 


INS+4 
ALSB 


INS+1 
ALSA 


INS+1 
INS 
SC+1 


LSA+1 
LSA 
NSP+1 
NSP 


PXF 


NSP+1 
NSP 
CSP+1 
CSP 


(NSP) ,Y 
#CRST 
ALSF 


NSP 
NSP 
ALSG 
NSP+1 


#1 


(NSP) ,Y 
(CSP),Y 
ALSI 
#CRST 
ALSH 
ALSN 
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:Get number of strings lo-byte 
>in X and A. 

:Test by ORing hi-byte if 
:number is zero, continue if 
>not, else exit immediately. 


>Now test for only 1 string, 
:lo-byte dec gives zero result 
7on OR with hi-byte if only 1. 
:Exit if only 1, else sort. 


>Pass initialisation. 
:Set string count to 
:number of strings in list. 


>Set next string pointer 
:to list start address. 


:Clear pass exchange flag. 


Initially, or if no exchange 
:last comparison, set current 
:pointer to next pointer. 
:(Exchange moves pointer). 


:Find start of next string. 
:Test from byte 0 

:lLooping until index Y 
spoints to terminator 

:of current string. 


:Add index + set Carry 
sas current string length 
:to next string pointer, 
:moving 

sit from current 

:to next string. 


:Compare strings. 

:Index from byte 0 comparing 
snext with current, skip out 
:if next < current (C = Q) 
sor next > current (C = 1). 
:If terminator then end loop 
swith current = next (C = 1). 
>Skip if next >= current. 


:>Temporary storage during exchange. 

:2-byte stored start address of list. 
:@-byte stored input number of strings. 
-2-byte next string pointer store. 
:2-byte string count store. 
:2-byte current string pointer store. 
:1-byte pass exchange flag store. 
:Carriage-return string terminator. 


M2 


M3 
01 
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LDY #-1 -Exchange strings (1). AO FF 
ALSJ INY :Move current string C8 
LDA (CSP),Y :to temporary store, | B1 M8 
STA TEMP,Y :from byte 0 to , 99 lo hi 
CMP #CRST :and including terminator. C9 OD | 
BNE ALSJ : DO F6 
: LDY #-1 -Exchange strings (2). AO FF 
ALSK _INY sMove next string C8 
LDA (NSP),Y :down memory to B1 M4 
STA (CSP),Y :replace current string, 91 M8 
CMP #CRST :-from byte 0 to C9 OD 
BNE ALSK sand including terminator. DO F7 
TYA :Add index + set Carry as 98 
SEC :length of moved string 38 
ADC CSP :to current string pointer 65 M8 
STA CSP :so it now addresses 1st byte 85 M8 
BCC ALSL :of space to receive stored 90 02 
INC CSPt4 scurrent string. E6 M9 
ALSL LDY #-1 sExchange strings (3). AO FF 
ALSM  INY -Move stored current string C8 
LDA TEMP,Y :to above shifted down next B9 lo hi 
STA (CSP),Y :string, from byte 0 to and 91 M8 
CMP #CRST sincluding terminator. C9 OD 
BNE ALSM -(CMP Leaves Carry set.) DO F6 
ROR PXF :Set bit 7 of pass exchange 66 MA 
LDA #$FF :flag. Set current exchange A9 FF 
PHA :flag on stack so current 48 
BMI ALSO :pointer not moved again. 30 03 
ALSN LDA #0 :Clear current exchange flag AY 00 
PHA :so current pointer is moved. 48 
ALSO LDX SC :Subtract string done A6é M6 
BNE ALSP :(compared, if not exchanged) DO 02 
DEC SC+1 :from string count. C6 M7? 
ALSP DEC SsC : C6 M6 
LDX SC ‘Test for end of pass. A6é M6 
DEX :Pass ends when only CA 
TXA "one string remains. 8A 
ORA SC+1 :Lo-byte - 1 OR hi-byte 05 M7 
BEQ ALSQ sis zero when count = 1. FO 05 
PLA -Get current exchange flag. 68 
BNE ALSE ‘If set then current already DO 94 
BEQ@ ALSD sat next, else go move it. FO 8A 
ALSQ- PLA :End of pass. Tidy up stack. 68 
BIT PXF :If pass exchange flag (bit 7) 24 MA 
BPL ALSR -is clear then end, else 10 03 
JMP ALSC sanother pass to do. 4C lo hi 
ALSR- RTS Exit ALSORT, list sorted. 60 
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QUICKER SORTING 


ALSORT could be improved in efficiency, making it quicker to 
process long lists, but at the cost of a longer routine. The first 
improvement is quite easy to implement. After the first pass, the 
highest value string has been ‘bubbled’ along to the end of the list 
and so on the second pass the last string need not be tested. On each 
successive pass one more string can be disregarded. Change the 
routine so that at the end of each pass the number of strings to sort is 
decremented and a further pass occurs only if two or more strings 
remain. 


The second improvement is more complex. Instead of beginning 
each pass at the start of the list, you could start at the string 
immediately preceding the first exchange on the last pass. All strings 
before it weren’t exchanged last time and they are not going to be 
exchanged this time either! One extra variable is needed to save the 
address of the last string not exchanged and this becomes the start 
string for the next pass. Whenever the start string is exchanged, 
begin the next pass at the start of the list. 


CASE STRUCTURE 


CASEOF by Peter Villadsen is a routine to perform case structure 
logic on an input case-value. It is rather like the ON <variable> 
GOSUB <line number> command in Basic except that the values 
tested need not be sequential. 


Only one byte is compared in each iteration and the search is not for 
the table position of a match but for the address associated with the 
matched value and used to pass control to a case action. Each address 
is stored with its associated case key making all table entries 3 bytes 
long. 


CASEOF can search through any number of case tables since the table 
base address is input in page zero and not programmed in. 
Consequently you can put it to many different uses such as calling 
subroutines by index rather than by address. Referring back to the 
subject of the last chapter, as all case actions are entered by an 
indirect jump to the address stored in page zero pseudo-registers M2 
and M3, they can be written to use only program relative addressing. 


Because indexed addressing is used by CASEOF, the maximum table 
size allowed is 256 bytes, or 84 cases. The 256 possible cases of a 
one-byte key can be tested for and acted on by using the no-match 
ELSE case action as a link setting CASEOF to work on 4 successive 
tables. 


$7 


SEARCH, SORT AND CASE 


= CASEOF Case structure. 
:J0B To compare an input case-value with a list of 
case keys, branching to the associated action on 
: key match or to default action on no key match. 
:ACTION Initialise for first test on case-key 1. 
: REPEAT: 
C Index next case-key. 
Compare input case-value with case-key. ] 
UNTIL case-key match or end-of-table. 
IF end-of-table THEN: C Index "ELSE" case. J 
Jump to case address. 
> CPU 6502 
:HARDWARE Memory containing case table. 
:SOFTWARE Subroutines associated with each case. 
INPUT X contains case-value. . 
MO,1 addresses start of case table. 
Case table format: 
ELSE case (no key match): 
(M0,1) + 0: 3 * keys (i.e. table length - 3). 
(MO,1) + 1 : ELSE case action address lo-byte. 
(MO,1) + 2 : ELSE case action address hi-byte. 
Key cases (n = 1 to 84): 
(MO,1) + 3n+0 : Case n key. 
(M0O,1) + 3nt+1 : Case n action address lo-byte. 
: (M0O,1) + 3nt+2 : Case n action address hi-byte. 
:OUTPUT M2,3 addresses action associated with either 
: matched case key or ELSE action if no key match. 
Control passed to location addressed by M2,3. 
: All registers and M0,1 unchanged. 
sERRORS Program control error if (M0,1) = 0 or is not a 
: multiple of 3. 
7REG USE X 
-STACK USE 5 + case routine stack use in excess of 5. 
RAM USE MO.to M3 
: LENGTH 49 
CYCLES Matched case: 38 + 35 * key position. 
: ELSE case: 57 + 35 * number of keys. 
CLASS 2 xdiscreet *interruptable *promable 
pRaKKK- *reentrant *relocatable -robust 
CAS = MO :2-byte stored address of case table. 
ACT = M2 :2-byte store for action address. 
CASEOF PHP :Save flags 08 
PHA sand registers 48 
TYA :for use 98 
PHA sin CASEOF, 48 
TXA :leaving input case-value 8A 
PHA :in A for comparisons. 48 
Lox #0 :Index table length byte. A2 00 
LDY #3 :Initially index ist key. AO 03 
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TSTNXT CMP 
BEQ 


PHA 
TYA 
CMP 
BEQ 


PLA 
INY 
INY 
INY 
SEC 
BCS 


ELSE PLA 
LDY 


FOUND INY 
LDA 
STA 
INY 
LDA 
STA 


(CAS) ,Y 
FOUND 


(CAS,X) 
ELSE 


TSTNXT 


#0 


(CAS) ,Y 
ACT 


(CAS) ,Y 
ACT+1 
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>Test for key match and 
zexit loop if found. 


:Else save case-value and 
:compare key index with table 
Length, exiting loop to 

:ELSE case if end of table. 


:Else restore case-value 
sand increment 

>Y to index key 

:of next case. 

:Then ensure branch occurs 
:and go test next key. 


:Tidy up stack. 
:index no-match ELSE case. 


:Index lo-byte of case 
saction address and copy 
sit to action address store. 
:Index hi-byte of case 
action address and copy 
sit to action address store. 


:Restore 

zall 

:registers 

sand 

:flags 

:then jump to correct 
saction for case. 


D1 
FO 


48 
98 


M0 
10 


07 


EF 


00 


M0 
M2 


M0 
M3 
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Data Compression 


A single-density 40-track floppy disk will hold about 100 KBytes. 
This might appear to be an adequate capacity for almost any 
purpose but it is soon used up. One disk will hold only the same 
amount of information as about 40 of the pages in this book. 





Although memory is becoming cheaper, the RAM available in 
most personal computers in present use is still probably in the 
range 16K to 32K. With a program, or suite of programs, loaded 
in to deal with the processing, you will be lucky to have ten book 
pages worth of immediately accessible data. 


Transmitting data by modem is not particularly fast, as you will 
know if you make extensive use of the telephone for inter-computer 
communication. Time is money — with a vengeance on long distance 
calls. 


These are all good reasons for compacting data into as few bytes as 
possible. In this chapter you will find routines to deal with various 
ways of compressing and expanding both numerical and textual 
data. 


12-BIT NUMBERS 


SQ@SH and EXPD by David Heale can give you 25% space saving on 
numerical data — but only if the numbers are held to 12-bit accuracy 
and normally take up 2 bytes with bits 12 to 15 either zeros or ‘don’t 
care’. The numbers have to be stored in the normal 6502 form of 
low-order byte first. 


SQSH processes two numbers at a time, treating them as a single 
block of 4 bytes (referred to as s1 to s4 in the routine documentation) 
with the high nibbles (top 4 bits) of the second and fourth bytes 
unused. The result of the processing is a block of 3 packed bytes (d1 
to d3 in the routine documentation). The inverse process, EXPD, acts 
on a source block of 3 bytes, expanding them to two 12-bit values. 


Since post-indexed addressing is used, the pointers are not moved 
past the block in either routine. If you repeatedly call SQ@SH or EXPD 
from inside a loop te deal with long strings of numbers, the loop will 
have to increment the pointers to the start of each new 4- or 3-byte 
block. For $Q@SH, you can set both source and destination pointers to 
the start of the string and the compressed string will replace the first 
75% of the source string bytes. 


61 


DATA COMPRESSION 


= S$QSH Squash Data. 
:J0B To compress two 16-bit words of numerical data, 
held to 12-bit accuracy, into 3 bytes. 
ACTION Move source byte 1 to destination byte 1. 
Discard s2-hi. Move s2-lo to d2-hi. 
Move s3-hi to d2-lo. Move s3-lo to d3-hi. 
Discard s4-hi. Move s4-lo to d3-lo. 
:CPU 6502 
*HARDWARE RAM: 4 bytes source, 3 bytes for destination. 
>SOFTWARE None. 
INPUT M0,1 addresses first byte of 4-byte source. 
: M2,3 addresses first byte of 3-byte destination. 
OUTPUT MO,1 & M2,3 and source data are unchanged. 
: Y =3. A = last squashed byte. 
sERRORS No check for destination overwrite of source. 
>REG USE A Y P 
>STACK USE None. 
:RAM USE MO M1 M2 M3 
: LENGTH 41 
CYCLES 98 
“CLASS 2 -discreet *interruptable *promable 
po kkekA xreentrant *relocatable -robust 
SRCES = M0 :2-byte stored address of source. 
DESTS = M2 :2-byte stored address of destination. 
SQSH LDY #0 :Index first bytes. AON 00 
LDA (SRCES),Y :Move source 1 B1 M0 
STA (DESTS),Y :to destination 1. 91 M2 
INY :Index second bytes. C8 
LDA (SRCES),Y :Get source 2, B1 MO 
ASL A :lo-nibble OA 
ASL A sinto A OA 
ASL A shi-nibble OA 
ASL A zand store OA 
STA (DESTS),Y sin destination 2. 91 M2 
INY :Index third bytes. C8 
LDA (SRCES),Y :Get source 3, B1 MO 
LSR A shi-nibble GA 
LSR A sinto A . 4A 
LSR A :lo-nibble 4A 
LSR A zand merge 4A 
DEY :(indexing second byte) 88 
ORA (DESTS),Y swith s2 lo-nibble 11 M2 
STA (DESTS),Y :and store in destination 2. 91 M2 
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> CPU 
>HARDWARE 
>SOFTWARE 


> OUTPUT 


>ERRORS 
REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


o-kkKKA 


” 

a 

ao 

m 

m 
oa 


EXPD LDY 
LDA 
STA 


INY 
LDA 
LSR 
LSR 
LSR 
LSR 
STA 


DATA COMPRESSION 


:Index third bytes. C8 
(SRCES),Y :Get source 3, B1 M0 
A :lo-nibble OA 
A sinto A OA 
A shi-nibble OA 
A sand merge OA 
>(indexing fourth byte) C8 
(SRCES),Y swith s4 lo-nibble. 11 M0 
:Index third byte and store 88 
(DESTS),Y :in destination 3. 91 M2 
:Exit, data compressed. 60 


To expand 3 bytes of numerical data, compressed 
by "SQSH", back to two 16-bit words with the 
four most significant bits of each set to zero. 
Move source byte 1 to destination byte 1. 

Move s2-hi to de-lo. Clear d2-hi. 

Move s2-lo to d3-hi. Move s3-hi to d3-lo. 

Move s3-lo to d4-lo. Clear d4-hi. 


M2,3 addresses first byte of 3-byte source. 

M0,1 addresses first byte of 4-byté destination. 
MO,1 & M2,3 and source data are unchanged. 

Y =4. A = last expanded byte. 

No check for destination overwrite of source. 
AYP 

None. 

MO M1 M2 M3 

42 


-discreet *interruptable *promable 
xreentrant *relocatable -robust 


M0 :2@-byte stored address of destination. 

M2 :2-byte stored address of source. 

#0 :Index first bytes. AO 00 

(SRCEE),Y :Move source 1 to B1 M2 

(DESTE),Y :destination 1. 91 M0 
:Index second bytes. C8 

(SRCEE),Y :Get source 2, B1 M2 

A shi-nibble into GA 

A slo-nibble A, 4A 

A :clearing hi-nibble 4A 

A :and store in GA 

(DESTE),Y :destination 2. 91 MO 
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LDA (SRCEE),Y :Get source 2, B1 M2 
ASL A -lLo-nibble OA 
ASL A into OA 
ASL A shi-nibble A, OA 
ASL A :clearing lo-nibble A, OA 
INY :(index third bytes) C8 
STA (DESTE),Y sand store in destination 3. 91 M0 
LDA (SRCEE),Y :Get source 3, B1 M2 
LSR A shi-nibble 4A 
LSR_ A sinto 4A 
LSR A zlo-nibble A GA 
LSR A sand merge with GA 
ORA (DESTE),Y :s2-lo, then store 11 M0 
STA (DESTE),Y :in destination 3. 91 M0 
LDA (SRCEE),Y :Get source 3 lo-nibble, - Bi M2 
AND #3$0F :clearing hi-nibble, 29 OF 
INY -(index fourth byte) C8 
STA (DESTE),Y :and store in destination 4. 91 MO 
RTS -Exit, data expanded. 60 
TOKENISED TEXT 


TKNIN and TKNOUT are based on a Sub Set Z-80 routine TOKN. 

TKNIN is written to make use of the keyword string comparison 
routine MATCH from the chapter on Comparisons. TKNOUT does not 

use MATCH but it does use the same token expansion table. 


TKNIN converts a normal ASCII file to one which is a mixture of 
normal ASCII characters ($01 to $7F) and tokens ($80 to $FF). Each 
token is the table index to the substring it has replaced. The 128 
substrings can, of course, be any of the most commonly used English 
phrases, words and letter groupings. 


TKNOUT is the reverse process. It writes a destination file using the 
ASCII codes from the source but converts codes $80 to $FF to the 
full ASCII expansion found in the table. The file-out will obviously 
be longer than the file-in, although there is no way to tell just how 
much greater it will be. You must make sure that the destination 
area is large enough. 


Both TKNIN and TKNOUT could be adapted to read from and write to 
disk handling routines. TKNOUT could write straight to a printer or 
to screen. 


Here is a short example of how the system works, with the symbol _ 
used to clearly mark spaces. First, part of the expansion table 
(tokens $80 to $94). Bit 7 of the last byte in each string is set to show 
termination. 
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TOKEN EXPANSION (HEX) 
$80 54 48 45 AO 

$81 49 CE 

$82 97 41 53 AO 

$83 57 4F 52 C4 

$84 41 46 44 AO 

$85 48 49 CD 

$86 4E 4F D4 

$87 4C 49 47 48 D4 
$88 37 49 54 C8 

$89 47 4F C4 

SBA 4F 55 D4 

$8B 54 48 49 GE C7 
$8C 41 46 44 20 54 48 45 AO 
$8D 49 4E 20 54 48 45 AO 
$8E 4D 41 44 C5 

$8F 53 48 49 4E C5 
$90 44 41 52 CB 

$91 45 4— C4 

$92 4E 45 53 D3 

$93 43 4F CD 

$94 50 52 C5 


DATA COMPRESSION 


EXPANSION (TEXT) 


THE_ 
IN 
WAS_ 
WORD 
AND_ 
HIM 
NOT 
LIGHT 
WITH 
GOD 
OUT 
THING 
AND_THE_ 
IN_THE_ 
MADE 
SHINE 
DARK 
END 
NESS 
COM 
PRE 


Now the very repetitive beginning of John’s Gospel using the tokens 


$80 to $94, 


TOKENISED (HEX) 


8D 42 45 47 81 4E 81 47 20 


90 92 20 93 94 48 91 45 44 


82 


20 


20 


2E 


20 


20 


20 


EXPANDED (TEXT) 


IN_THE_BEGINNING_WAS_ 
THE_WORD,_AND_THE_WORD_ 
WAS_WITH_GOD,_AND_THE_ 
WORD_WAS_GOD._THE_ 
SAME_WAS_IN_THE_ 
BEGINNING_WITH_ 
GOD._ALL_THINGS_ 
WERE_MADE_BY_ 

HIM; _AND_WITHOUT_HIM_ 
WAS_NOT_ANY_THING_ 
MADE_THAT_WAS-MADE. 
~IN_HIM_WAS_LIFE;~_ 
AND_THE_LIFE_WAS_THE_ 
LIGHT_OF_MEN._ © 
AND_THE_LIGHT_SHINETH_ 
IN_DARKNESS;_AND_THE_ 
DARKNESS_COMPREHENDED_ 
IT_NOT.<terminator> 


As you can see, the use of tokens has reduced a 328 character string 
(including null terminator) to only 152 bytes. Not quite the Bible on 
the back of a postage stamp but nevertheless a space saving of over 
50%. The passage does contain more repetition than might be 
expected in normal English text but only 21 of the 128 available 


tokens have been used. 
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DATA COMPRESSION 


Such words as ‘IN’, ‘OUT’, ‘WITH’ and ‘END’ have been entered 
in the table without spaces since this enables words like “‘WITH- 
OUT’, ‘WITHIN’ or ‘OUTSHINE’ to be formed by concatenation. 
They could be repeated in the full expansion table with spaces after, 
or even before, to give even more saving. 
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:ACTION 


> SOFTWARE 


> OUTPUT 


> ERRORS 
REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


CLASS 2 
cokkike 


o 
” 
~_ 
— 
Te | 


JSR 
LDY 
INX 
BNE 


To convert a source file of full ASCII text to a 
destination file of ASCII chararacters and 
control codes and "tokens" ($80 to $FF). 
REPEAT: 
C Call token string match routine. 
IF match found 
THEN: C Use match string index as token. ] 
ELSE: CE Get source character. 
Point to next source byte. ] 
Write character/token to destination. 
Point to next destination byte. J] 
UNTIL terminator addressed. 
Write terminator to destination. 


CPU 
>HARDWARE 


6502 

Memory containing source text. 

Destination RAM to contain tokenised text. 
"MATCH" - a routine to test for a match between 
source file substrings and "token" strings, 
returning token (table position) in X and 
incrementing source pointer past matched string. 


MO,1 addresses start of ASCII source text. 
The text must terminate with a null (0) byte. 
M6,? addresses start of destination area. 
M0,1 addresses source null terminator. 
Source text is unchanged. 
M6,/’ addresses destination null terminator. 
Destination contains tokenised text. 
All registers changed. 
Destination may overwrite source. 


PAX Y 

2 

MO M1 M6 M7 

37 

Not given. 

-discreet *interruptable *promable 

kxreentrant *relocatable -robust 

MO ‘s2-byte stored source address. 

M6 s2-byte stored destination address. 

MATCH :Check for token string match. 20 lo hi 

#0 :For index of current bytes. AO 00 
:Test if token matched and E8 

TMATCH :skip if so. DO OA 


LDA 
INC 
BNE 
INC 
BNE 
TMATCH DEX 
TXA 
ORA 
WRITEB STA 
INC 
BNE 
INC 
TERMCH LDA 
BNE 
STA 
RTS 
= TKNOUT 
:J0B 
tACTION 
> CPU 
sHARDWARE 
sSOFTWARE 


DATA COMPRESSION 


(SRCI),Y :Else, get source character B1 M0 
SRCI sbyte and increment source E6 MO 
WRITEB :pointer ready for next DO 08 
SRCI+1 :terminator/token match test. £6 M1 
WRITEB :Go write byte to destination. DO 04 


sRestore correct token table CA 
:offset, into A and set bit 7 8A 
#$80 zas non-ASCII token. 09 80 


(DSTI),Y :Write ASCII character/token 91 M6 
DSTI sto destination and increment £6 M6 
TERMCH spointer to next character, DO 02 
DSTI+1 swith any carry to hi-byte. E6 M7 


(SRCI),Y :Check for null terminator B1 MO 

TKNIN sand continue if not. DO DE 

(DSTI),Y :Else write destination 91 M6 
:terminator and exit. 60 


To convert a source file consisting of ASCII 
chararacters and control codes and "tokens" ($80 
to $FF) to full ASCII text by reference to a 
token expansion table. 
REPEAT: 
C Get character. 
IF character NOT terminator THEN: 
C IF character is a token THEN: 
C Convert token to expansion index. 
Index correct expansion. 
Write expansion bytes - 1 to destination. 
Add expansion length - 1 
to destination pointer. 
Get last expansion character. ] 
Write character to destination. 
Point to next destination byte. 
Point to next source byte. ] ] 
UNTIL terminator. 
Write terminator to destination. 


6502 

Memory containing source text. 

Memory containing token expansion table. 
Destination RAM to contain expanded text. 
None. 
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DATA COMPRESSION 


: INPUT MO,1 addresses start of destination area. 
: M6,?7 addresses start of tokenised source text. 
The text must terminate with a null (0) byte. 
M4,5 addresses start of token expansion table. 
Table length: up to 128 strings. 
String length: up to 255 characters (bytes). 
Character (byte) range: 0 to $/7F. 
: String terminator: bit 7 set on last byte. 
:OUTPUT MO,1 addresses destination null terminator. 
: Destination contains fully expanded text. 
M6,/ addresses source null terminator. 
Source text is unchanged. 
M4,5 not changed. M2,3 may be changed. 
: ALL registers changed. 
sERRORS Arithmetic error if D = 1 (decimal mode). 
: Destination may overwrite source or table. 
REG USE PAX Y 
sSTACK USE None. 
sRAM USE MO to M7 


> LENGTH 88 

:CYCLES Not given. 

CLASS 2 -discreet *interruptable *promable 

T—kkRK- xreentrant *relocatable -robust 

DSTO = M0 :2-byte stored destination address. 

EXPO = M2 :2-byte for storing expansion address. 

TAB = M4 :2-byte stored expansion table address. 

SRCO == Mé :2-byte stored source address. 

TKNOUT LDY #0 sIndex and get currently AO 00 
LDA (SRCO),Y saddressed character, B1 M6 
BNE NONULL zprocess if not null DOD 01 
STA (DSTO),Y selse write destination 91 M0 
RTS :terminator and exit. 60 

NONULL BPL WRITEA :Skip to write if ASCII. 10 3C¢ 
LDX TAB :Else token, so move Aé M4 
STX EXPO :token table address 86 M2 
LDX TAB+1 :to expansion address A6é M5 
STX EXPO+1 variable. 86 M3 
AND #$7F :Clear token flag, bit 7, 29 7F 
TAX sand move to X as expansion AA 
BEQ@ TKNX sindex, skip if expansion 0. FO 14 

TXLP LDY #-1 :Index from byte 0. AO FF 

TSLP  =EINY :Index next character, C8 
LDA (EXP0),Y :load to test it and repeat B1 M2 
BPL TSLP suntil end of expansion. 10 FB 
TYA :Add expansion end index + 98 
SEC :set Carry, as expansion 38 
ADC EXPO sbyte length, to expansion 65 M2 
STA EXPO :address, giving address 85 M2 
BCC TXLPT :of Ist byte of next 90 02 
INC EXPO+4 sexpansion in EXPO. E6 M3 
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DATA COMPRESSION 


TXLPT DEX >Repeat until EXPO addresses CA 
BNE TXLP srequired expansion. DO EC 
TKNX LDY #0 :Index current bytes. AO 00 
WRITEX LDA (CEXPO),Y :Loop, moving characters B1 M2 
BMI DFREE >from expansion (if not last 30 05 
STA (DST0O),Y :byte) to destination. 91 M0 
INY :As Y never reaches 256 (0) C8 
BNE WRITEX :always loop for next byte. DO F5 
DFREE TYA :Y indexes next free byte 98 
CLC :so, add without carry, 18 
ADC DSTO zadd next free byte index 65 M0 
STA DSTO :to destination pointer 85 MO 
BCC LASTX :giving address of next 90 02 
INC DSTO+1 :free byte in DSTO. E6 M1 
LASTX LDA (EXPO),Y :Get last expansion byte and B1 M2 
AND #$7F :clear end-of-string flag. 29 7F 
LDY #0 :Restore index to 0. AO 00 
WRITEA STA (DSTO),Y :Write ASCII/last expansion 91 M0 
INC DSTO :byte and increment E6 MO 
BNE SRCINC :destination pointer ready DO O02 
INC DSTO+1 >for next character. E6 M1 
SRCINC INC SRCO :Increment source pointer E6 M6 
BNE CONTIN :to next character, DO 02 
INC SRCO+1 swith any carry to hi-byte. E6 M7 
CONTIN SEC :Ensure branch and continue 38 
BCS TKNOUT sprocess until terminator. BO A8 
ESCAPE MESSAGES 


Similar in concept to TKNOUT but possibly more versatile since it is 
not limited to only 128 substrings is MAKMSG, a 6502 version of a Sub 
Set routine originally written in Z-80 code. Where a non-ASCII byte 
sent TKNOUT off on a search through its expansion table, the same 
phenomenon causes MAKMSG to read the next 2 bytes as the address 
of the substring to be inserted. 


MAKMSG acts recursively so the substrings can be nested to any 
depth. The ‘escapes’ don’t necessarily have to be to the start of a 
substring and, if required, may be made to the ending of the last 
word in a long string. However, since each escape consists of the 
escape code itself and the substring address (3 bytes in all), the law of 
diminishing returns sdon comes into effect. It simply isn’t worth 
using escapes for words like ‘THE’ and ‘IS’ or word endings like 
‘ING’. 
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DATA COMPRESSION 


sACTION 


To output ASCII chararacters and control codes 
($01 to $7F) as normal but to regard non-ASCII 
values ($80 to $FF) as "escape" codes followed 
by the address of a sub-message, to be output 
before continuation of the main message. 
REPEAT: 
C Get character. 
IF character NOT terminator THEN: 
C IF character is ASCII 
THEN: C Output character. J 
ELSE: € Save pointer. 
Get escape address. 
Call MAKMSG. 
Restore pointer. ] 
Point to next character. J] ] 
UNTIL terminator. 


CPU 
>HARDWARE 
:SOFTWARE 


6502 

Memory containing message and sub-messages. 
Subroutine "OUTCH" to output an ASCII character 
in A to screen, printer, etc., without changing 
register and flag values. 


OUTPUT 


>ERRORS 


REG USE 
>STACK USE 


MO,1 addresses Ist byte of top level message. 
Each message and sub-message must terminate with 
a null (0) byte. 

The absolute “escape” addresses must be written 
with low order byte first. 

Sub-message nesting can be to any depth. 

M0,1 addresses top level message terminator. 
A= 0. Y = 0. X and P are unknown 

Arithmetic error if D = 1 (decimal mode). 

An "infinite loop" can occur if a low level 
message "escapes" to one higher up the chain. 
PAX Y 

4 * message nesting + 2 + QUTCH stack use. 


sRAM USE MO M1 

: LENGTH 55 

:CYCLES Not given. 

CLASS 2 -discreet *interruptable *promable 

Tok = *reentrant -revocatable -robust 

MSG = M0 :Stored address of current message. 

MAKMSG LDY #0 :Index currently addressed AO 00 
LDA (MSG),Y :byte, get in A and skip to B1 M0 
BNE NOTERM :process unless null byte, D0 01 
RTS zexit at end of current level. 60 

NOTERM BMI ESCP :Skip if escape code. Else 30 07 
JSR OUTCH :go output ASCII character. 20 lo hi 
LDA #1 :A = 1 for single character A9 01 
BNE MSGINC sincrement to message pointer. DO 1C 
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ESCP 


MSGINC 


LDA 
PHA 
LDA 
PHA 


INY 
LDA 
TAX 
INY 
LDA 
STA 
STX 
JSR 


PLA 


MSG+1 


MSG 


(MSG) ,Y 


(MSG) ,Y 


MSG+1 


MSG 


MAKMSG 


MSG 


MSG+1 


#3 


MSG 
MSG 


MAKMSG 
MSG+1 
MAKMSG 


DATA COMPRESSION 


:Save this level message A5 M1 
:pointer to stack through A, 48 
:$O escape address can go A5 MO 
>in MSG for recursive call. 48 
:Index and get escape address C8 
:low order byte saved in X B1 M0 
>So MSG not yet changed. AA 
:Index and get escape address C8 
zhigh order byte in A. Bi MO 
:Write escape address to MSG 85 M1 
sand call MAKMSG to process 86 MO 
sescape message at new level. 20 lo hi 
:Escape message finished, so 68 
srestore pointer to this level 85 M0 
tback to MSG, addressing 68 
zescape byte, so A = 3 to 85 M1 
smove it past escape address. A9 03 
:No carry in to addition. 18 
:Add 1 or 3 to message 65 M0 
:pointer, moving it to 85 MO 
saddress next character, 90 CD 
zrepeat always, after taking E6 M1 
:care of any carry to hi-byte. BO C9 


If you are really short of text storage space then try combining 
TKNOUT and MAKMSG: use $81 to $FE for ‘tokens’ and $FF for 
‘escapes’. Meanwhile, here’s an example of how MAKMSG can be used 
to produce new jargon phrases almost as fast as they develop 
naturally. The lefthand column shows the addresses where the text 
code is located, the middle column gives the text in hexadecimal and 
the righthand column is what MAKMSG makes of it. 


1234: 
123C: 
1245: 
124C: 
1254: 
125C: 
1263: 
126A: 
1272: 
127A: 
1281: 
1288: 
1291: 


52 4D 00 INFORM 

49 4F 4E 00 INFORMATION 

4E 00 TECHN 

41 4C€ 00 TECHNICAL 

4F 47 00 TECHNOLOG 

12 00 TECHNOLOGICAL 

54 00 TECHNOLOGIST 

49 4C 00 DETAIL 

12 59 00 INFORMATION TECHNOLOGY 
12 00 INFORMATION TECHNOLOGIST 
12 00 TECHNICAL DETAIL 

FF 3C 12 00 DETAILED INFORMATION 
2D° TECHNOLOGICALLY- 

00 INFORMED 


/1 


Data Moves 


The first routine in this chapter is not strictly a data move routine. 
TEXT by Andrew Johnson deals with outputting a string of text 
embedded in the program. It is included here because it illustrates 
how to extract data buried inside a program. TEXT can be 
adapted to put the extracted data straight to display RAM, into a 
variable storage area, or anywhere else you can think of. 


TEXT is ideal for printing out short messages to the user, such as 
input prompts or error reports. And having the messages actually in 
the section which generates them can make an assembly language 
program more readable. On the negative side, disassembling 
machine code that contains embedded data is not easy. You could 
make use of the idea as a simple form of ‘program protection’ but 
don’t lose your source program or you are in for a very frustrating 
time. 


= TEXT Output program embedded text. 

:J0B To output a program embedded text string. 

>ACTION Pull return address as pointer to text. 

: Ensure addressing from text byte 0. 
REPEAT: 


{[ Increment text pointer. 
Get text character. 
IF character NOT terminator THEN: 
C Output character. ] J] 
UNTIL terminator. 
Push text pointer as return address. 
>CPU 6502 
‘HARDWARE None. 
:SOFTWARE "OUTPUT" - A routine to output character in A, 
: should not change registers, must not change Y. 
> INPUT Text to be ouput must follow "JSR TEXT". 
: Text length: indefinite. 
Information byte/s: None. 
Text character range: $01 to $FF. 
: Terminator: null (0) byte following text. 
:QOUTPUT Return to location following terminator. 
: MO,1 contains address of text terminator. 
Y = 0. P and A are changed. 
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DATA MOVES 


:ERRORS None. 
>REG USE PAY 
=STACK USE "OUTPUT" stack use. 
:RAM USE MO M1 


> LENGTH 31 

: CYCLES 45 + text characters * (26 + "OUTPUT" cycles). 

CLASS 2 -discreet *interruptable *promable 

rr RKKKK kreentrant *relocatable *robust 

PTR = MO :2-byte store for return address. 

TEXT PLA :Pull return address from 68 
STA PTR -stack top and store in 85 M0 
PLA page zero for use as pointer 68 
STA PTRt1 >to embedded text. , 85 M1 
LOY #0 -Index zeroth byte throughout. AO 00 

TEXTLP INC PTR -Move pointer to address next £6 MO 
BNE READCH -text character, with any DO 02 
INC PTR+1 ‘carry to pointer hi-byte. E6 M1 

READCH LDA (PTR),Y :Get current character but Bi MOQ 
BEQ FINISH send if null terminator. ~ FO 06 
JSR OUTPUT -Else output character, then 20 lo hi 
TYA sensure branch occurs (Y = 0) 98 
BEQ@ TEXTLP zand go get next character. FO FO 

FINISH LDA PTRt1 -At end, pointer addresses A5 M1 
PHA :terminator, i.e. return 48 
LDA PTR -address - 1, so will be A5 MO 
PHA ‘correct return address on 48 
RTS sstack top for RTS exit. 60 

INTELLIGENT TRANSFER 


In a non-intelligent transfer routine which always starts at the lowest 
address, source data will be overwritten before it is moved if the 
destination start address is within the source block. 


IBT by Alex Selby (an improvement to an original 6502 Sub Set 
routine, BLKMV) ensures against this possiblity by transfering data 
from the highest address downwards when the destination is higher 
than the source. If the source is the higher of the two then the move 
starts at the lowest address. The only time this arrangement doesn’t 
work is when 16-bit ‘wraparound’ addressing is used and $0000 is 
assumed to follow on from $FFFF. 
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>ACTION 


> CPU 
sHARDWARE 
:SOFTWARE 


OUTPUT 


> ERRORS 


>REG USE 
>STACK USE 


RAM USE 

> LENGTH 

> CYCLES 

:CLASS 2 

t-kkk-- 

SRCE = 

DEST = 

LEN = 

IBT PHP 
PHA 
TXA 
PHA 
TYA 
PHA 
CLD 
LDX 


DATA MOVES 


To transfer a block of data so that the moved 
data does not overwrite the yet uncopied source. 
IF destination < source 
THEN: 
C Address first byte of source and destination. 
REPEAT: 
C Move byte from source to destination. 
Increment source and destination pointers. ] 
UNTIL transfer completed. ] 
ELSE: 
C Address last byte of source and destination. 
REPEAT: 
[C Decrement source and destination pointers. 
Move byte from source to destination. ] 
UNTIL transfer completed. J] 


6502 
Source and destination RAM. 
None. 


M0,1 addresses first byte of source. 

M2,3 addresses first byte of destination. 
M4,5 = byte length of data block. 

Block at destination. 

Pointer hi-bytes, M1 and M3, are changed. 
Source block may be overwritten. 

No registers are changed. 

Source data could be overwritten prior to the 
transfer if 16-bit "wraparound" addressing is 
used (i.e. $0000 follows SFFFF). 


None. 

4 

M0 to M5 

83 

Not given. 

-discreet *interruptable *promable 

*reentrant -relocatable -robust 

M0 :Stored source start address. 

M2 :Stored destination start address. 

M4 :Stored byte-length of source block. 
:Save 08 
call 48 
sregisters 8A 
:on 48 
:stack 98 
zand ensure 48 
sbinary arithmetic. d8 

LEN+1 :Get 256-byte block count. A6 M5 
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DATA MOVES 


c..eeTest relative positions of source and destination. 
SEC >No borrow going into 38 
LDA SRCE ssource - destination A5 MO 
SBC DEST :comparison. —E5 M2 
LDA SRCE+1 : : A5 M1 
SBC DEST+1 :If source below dest E5 M3 
BCC RVRS :then move from end down. 90 18 
teeeelnitialise index and count for 1st to last byte move. 
LOY #0 :Index from lowest byte. AQ 00 
INX :Correct for DEC end check. E8 
:..2.1St to last byte move in 256-byte blocks. 
FWDOLP CPY LEN :Test possible end of C4 M4 
BNE FWDTFR sblock, skip if not there. DO 03 
DEX :Else dec length hi-byte CA 
BEQ EXIT sand end if block end. FO 2E 
FWOTFR LDA (SRCE),Y :Get source byte into B1 M0 
STA (DEST),Y :destination and move index 91 M2 
INY :to next byte, looping till C8 
BNE FWDLP send of 256-byte index DO F2 
INC SRCE+1 zthen inc base hi-bytes E6 M1 
INC DEST+1 :for next 256-byte block E6 M3 
JMP FWDLP zand repeat. 4C lo 
2...eNote that carry flag is clear on branch to RVRS. 
:....Move base addresses to last page of block and 
teeeeinitialise index and count for last to 1st byte move. 
RVRS LDY LEN :Index from highest byte. A& M4 
TXA zAdd hi-byte of byte length 8A 
ADC SRCE+1 :to source and destination 65 M1 
STA SRCE+1 sbase addresses so they 85 M1 
TXA zindex the highest page of 8A 
CLC :the data blocks. 18 
ADC DEST+1 : 65 M3 
STA DEST+1 : 85 M3 
INX :Correct for DEC end check. E8 
taeeeLast to 1st byte move in 256-byte blocks. 
RVRSLP TYA -Test for end of index 98 
BEQ HIDEC :256-byte block, skip if so. FO 08 
RVTFR DEY :Pre-dec index and move one 88 
LDA (SRCE),Y :byte from source to B1 M0 
STA (DEST),Y sdestination and 91 M2 
JMP RVRSLP srepeat till ist byte done. 4C lo 
HIDEC DEC SRCE+1 zEnd of 256-byte block so C6 M1 
DEC DEST+1 :dec base address hi-bytes to C6 M3 
DEX sindex next lower block and CA 
BNE RVTFR zrepeat if count not done. DO F1 
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EXIT PLA 


DATA MOVES 


:Restore 68 
sall A8 
:registers 68 
:from AA 
:stack 68 
‘and 28 
:exit, transfer done. 60 


MEMORY BLOCK ROTATION 


RRL by R. G. Cath performs a long rotation on a data block. You 
must input the address of the lowest byte in the block, the address of 
the byte immediately following the block and the byte distance each 
byte in the block has to be rotated. No memory outside the block is 
affected by the routine since any hop beyond the end of the block is 
trapped and wrapped around to become a hop into the lower end of 


the block. 


RRL could be a useful subroutine for many programs — an insertion 
sort of multi-byte elements and a word processor are just two that 
spring to mind. 


sACTION 


7 HARDWARE 
> SOFTWARE 


To rotate a block of memory, each byte moving a 
given distance upwards in memory, with all 
distances past the end of the block recalculated 
from the start of the block. | 
ON ERROR: C Exit, overflow flag set. J 
Pointer = block-start. Wraparounds = 0. 
WHILE Wraparounds NOT EQUAL TO movelength: 
{C Cycle-start = pointer. 
Move indexed byte to temp-store. 
REPEAT: 
C REPEAT: 
C Exchange indexed byte with temp-store. 
Pointer = pointer + movelength. J 
UNTIL Pointer past block-end. 
Wraparounds = wraparounds + 1. 
Pointer = pointer - (block-end + 1). 
Pointer = pointer + block-start. J 
UNTIL Pointer = Cycle-start. 
Move temp-store to indexed byte. 
Pointer = pointer + 1. ] 
Set rotation completed flag. 


6502 
RAM containing block to rotate. 
None. 


DATA MOVES 


> INPUT MO,1 addresses ist byte in block. 
: M2,3 addresses byte following block. 
M4,2 is the byte distance to rotate each byte. 
: (M2,3) + (M4,5) - 1 should not exceed $SFFFF. 
OUTPUT Registers and M6 to MB changed. 
: MO to M5 not changed. 
C = 0: Invalid input, arithmetic overflow. 
Block may be partly rotated. 
: C = 1: Rotation completed. 
sERRORS Arithmetic error if D = 1 (decimal mode). 
>REG USE PAX Y 
:STACK USE 2 
>RAM USE MO to MB 


> LENGTH 121 

CYCLES Not given. 

:CLASS 2 -discreet *interruptable *promable 

LokaEKK- *xreentrant *relocatable -robust 

BST = MO :2-byte stored block start address. 

BND = M2 :2-byte stored block end address + 1. 

BSD = M4 :2-byte stored byte distance to move. 

WRP = M6 2-byte store for wraparound counter. 

PTR = M8 :2-byte store for current pointer. 

CST = MA :2-byte store for cycle start address. 

RRL LDY #0 :Zero index throughout. AO 00 
STY WRP :Set wraparound count to 84 M6 
STY WRP+1 :zero initially. 84 M7 
LDA BST :Set pointer to A5 m0 
STA PTR :start of block 85 M8 
LDA BST+1 sinitially. A5 M1 
STA PTR+1 : 85 M9 

BCYCLE LDA BSD Cycle start. A5 M4 
CMP WRP :Compare number of C5 M6 
BNE BSTART :wraparounds with distance DO 08 
LDA BSD+1 :to move each byte. A5 M5 
BMI EXITNV Exit, error, if move negative. 30 5D 
CMP WRP+1 -Rotation done if wraparounds C5 M7 
BEQ@ EXITRD >= distance, else continue. FO 5B 

BSTART LDA PTR :Save pointer address at start A5 M8 
STA CST of cycle for test when cycle 85 MA 
LDA PTR+1 sends (i.e. when wrapped around A5 M9 
STA CST+1 :pointer is back at start). 85 MB 
LDA (PTR),Y Initially move ‘st byte to Bi M8 
PHA :stack, ready for exchanges. 48 

NXTHOP LDA (PTR),Y :Exchange indexed B1 M8 
TAX sbyte with that AA 
PLA :from last position 68 
STA (PTR),Y :stored 91 M8 
TXA stemporarily 8A 
PHA :on stack top. 48 
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NXTPAS 


PRESET 


EXITPL 
EXITNV 


EXITRD 


CLC 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
BCS 


SEC 
LDA 
SBC 
TAX 
LDA 
SBC 
BCC 


INC 
BNE 
INC 


PHA 
TXA 
CLC 
ADC 
STA 
PLA 
ADC 
STA 
BCS 


CMP 


PTR 
BSD 
PTR 
PTR+1 
BSD+1 
PTR+1 
EXITPL 


PTR 
BND 


PTR+1 
BND+1 
NXTHOP 


WRP 
PRESET 
WRP+1 


BST 
PTR 


BST+1 
PTR+1 
EXITPL 


CST 
NXTPAS 
PTR 
CST 
NXTPAS 


(PTR) ,Y 


PTR 
BCYCLE 
PTR+1 


BCYCLE 


DATA MOVES 


:Prepare to add, no carry in. 
:Add shift distance to 

:current pointer, moving it 

:to address byte at | 
:destination for byte now 
:stored on stack. 

:But exit in error if "hopping" 
:gone past 64K memory. 


:Prepare to subtract, no borrow. 
:Test relative position of 
:current pointer and block end, 
:setting Carry if pointer lower, 
sand temporarily retaining 
:difference in AX. If pointer 
slower then continue "hopping". 


Else gone past block end, and 
:AX is overshoot. 
:wraparound counter; then reset. 


So increment 


:Save overshoot hi-byte 

swhile processing lo-byte. 
:Prepare to add, no carry in. 
:Add block start to overshoot 
rresetting pointer within block. 
:Restore overshoot hi-byte and 
:add block start hi-byte. Okay 
:if pointer in block but exit as 
zerror if past 64K memory. 


:At end of pass, when wraparound 
shas occurred, test if at start 
zlocation. If so then cycle has 
zended but if not then another 
spass needed. 


:At end of cycle, when pointer 
saddresses start location, move 
zbyte from stack to replace Ist 
sbyte moved this cycle, 

:Move pointer to next location 
-and go test if another cycle 
>is needed. 


:Tidy up stack before exit. 
:Reset Carry to show rotation 
sinvalid/incomplete, exit. 


:Set Carry to show rotation 
:completed successfully, exit. 


M8 
M2 


M9 
M3 
DD 
M6 


M7? 
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MATRIX TRANSPOSITION 


There are two ways that you can store a two dimensional array, or 
matrix, of single-byte elements in linearly addressed memory. Either 
the bytes in each row are stored contiguously or else each successive 
location holds the next byte down the column. For example, the 
matrix: 


A B C D 

E F GH 

I J KL 
can be stored as ‘A B C DEF GH I J K L’ (Sequential row 
storage) or as ‘A E|BFJCGK DH LU’ (sequential column 


storage). 


It doesn’t usually matter which way matrices are stored since any 
byte can be accessed fairly easily by using a simple multiplication for 
one dimension and an addition for the other. Sometimes, however, it 
does matter very much because multiplication on the 6502 is not a 
particularly fast operation. Occasionally, especially in graphics 
applications, a matrix has to be ordered in both ways for quicker 
access. 


TRANS by Vernon Webb will turn, or transpose, a matrix from one 
form of storage to the other. The most interesting feature is that the 
matrix is transposed in its own space, without recourse to any other 
storage area except a few bytes of page zero used for variables. This 
eliminates any problems that can result from working in a small 
amount of RAM but the method of TRANS is very slow — for large 
arrays at least. For a matrix composed of R rows by C columns, the 
number of elements that have to be moved is, 


C(RC+4)* (R-1)* (C-1)/4. 


This means that only 5 bytes have to be moved in a 2 by 3 matrix but 
the number increases rapidly. 105 elements have to be moved in a 4 
by 6 matrix (shown below) and the massive number of 14,625 bytes 
have to be shuffled around to transpose a 16 by 16 matrix — an 
average of just over 57 moves for each byte. Perhaps it is just as well 
that TRANS doesn’t attempt to transpose matrices of more than 256 
elements! 


The method of TRANS is really quite simple. It rotates a block of 
elements extending from the position where the next column-— 
sequential byte is to go to that byte’s current, row-sequential 
position. The byte is put in its rightful place, the bytes below it 
being shuffled up to make room — rather like the method used in 
insertion sorts. 
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This is demonstrated below on a small 4 by 6 matrix. The top line 
shows the input row-sequential storage and the bottom line is the 
column-sequential transposed result. The intervening 15 lines show 
all intermediate states with the block from ‘ED’ to ‘ES’ (about to be 
rotated) highlighted as upper-case letters. 


AS ROWS: < 1 >< 2 >< 3 »< 4 > 
abcdefghijklmnopgqrstuvwywx 

CI RI ED ES 

0 116 aBCDEFGhijklmnopaqrstuvwyw x 
0 2212 agBCDEFHIJSKLMnopqrstuvwx 
03 318 agmBCDEFHIJKLNOPQRStuvwx 
115 9 agmsbCDEFHijktlnopaqrtuvwx 
12614 agmsbhCDEFIJKLNopaqrtuvw x 
13 719 agmsbhnCDEFIJSJKLOPQRTuvwx 
21912 agmsbhntcDdDEFIjktlopqruvwywx 
221016 agmsbhntciDdDEFJKLOpaqruvyx 
231120 agmsbhntcioDEFJKLPQRUVwWx 
3 911315 agmsbhntcioudEFJklparvwx 
3 921418 agmsbhntcioudjEFKLpP qrvwx 
3 31521 agmsbhntcioudjpEFKLQRVwx 
4 11718 agmsbhntcioudjpveFKlarwx 
4 21820 agmsbhntcioudjpvekFL@rwx 
4 319 22 agmsbhntcioudjpvekqFLRw x 

AS COLUMNS: < 1 ><« 2 »> 3 ><*« 4 >« 5 >€ 6 > 

agmsbhntcioudjopv kqwftlr 
= TRANS Own-space matrix transposition. 
:J0B To transpose a 2-dimensional array, or matrix, 


in its own RAM space, storage row-after-row 
: becoming column-after-column. 
:ACTION FOR col-index = 0 to cols - 2: 
: C FOR row-index = 1 to rows - 1: 
C Source = Matrix-start + 
(col-index * rows) + 
(row-index * cols) - 
(row-index * col-index). 
Dest = Matrix-start + rew-index + 
(col-index * rows). 
(Temp) = (source). 
(Dest + 1, source) = (dest, source - 1). 
(Dest) = (temp). J] ] 


>HARDWARE RAM containing matrix. 

>SOFTWARE None. 

> INPUT MO,1 addresses first byte of matrix. 

: M2 contains number of rows in matrix. 
M3 contains number of columns in matrix. 
Maximum matrix elements: 256. 
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OUTPUT MO to M3 are unchanged. M4 to M8 changed. 
: ALl registers unchanged. 
sERRORS Arithmetic error if D = 1 (decimal mode). 


: Incorrect transposition if M2 * M3 > 256. 
:REG USE None. 

<STACK USE 4 

>RAM USE MO to M8. 


: LENGTH 102 

CYCLES Not given. 

:CLASS 2 -discreet *interruptable *promable 

i kkK-- xreentrant -relocatable -robust 

MTX = MO s2-byte stored Matrix start address. 

ROWS = M2 :1-byte stored number of rows. 

COLS = M3 -1-byte stored number of columns. 

CI = M4 1-byte store for column index. 

CR = M5 si-byte store for CI * ROWS by adding. 

RI = M6 :i-byte store for row index. 

ES = M7 -1-byte store for element source index. 

ED = M8 :1-byte store for element dest. index. 

TRANS PHP :Save 08 
PHA iflags 48 
TYA and 98 
PHA sregisters 48 
TXA :for use 8A 
PHA sin TRANS. 48 
LDA #-1 -Ensure column index is 0 AY FF 
STA CI ‘when first INC'd in NEWCOL. 85 M4 
SEC -So no borrow in to subtract. 38 
LDA #0 ‘Ensure CI*ROWS variable is 0 A9 QO 
SBC ROWS ‘when first incremented by E5 M2 
STA CR >ROWS in NEWCOL. 85 M5 

NEWCOL INC CI -Index next column, first row. E6 M4 
CLC -Ensure no carry in to add. 18 
LDA CR :Compute offset of first row A5 M5 
ADC ROWS ‘of next column from MTX, 65 Me 
STA CR -(j.e. column index * ROWS). 85 M5 
STA ED :Also destination offset. 85 M8 
LDA #0 ‘Ensure row index is 1 when AY 00 
STA RI -first INC'd in NEWROW. 85 M6 

NEWROW INC RI -Index next row, this column. E6 M6 
LDA CR “Doz: ES=CR-CRI*CI)+(RI*COLS), A5 M5 
SEC :Prepare for A-(CRI*CI) by 38 
LOX Cl srepeated subtraction. A6é M4 

SUBLP BEQ SUBDON :Skip out when CI Loops done. FO 06 
SBC RI -Subtract RI once for E5 M6 
DEX revery Cl. CA 
JMP SUBLP -Loop to exit on CI done test. 4C lo hi 

SUBDON CLC -Prepare for A-(RI*COLS) by 18 
LDX RI srepeated addition. A6é M6 
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ADDLP BEQ 
ADC 
DEX 
JMP 


ADDDON STA 
INC 
TAY 
LDA 
TAX 


SHFTLP DEY 
LDA 
INY 
STA 
DEY 
CPY 
BNE 


TXA 
STA 


LDX 
DEX 
CPX 
BNE 


LDX 


BNE 


ADDDON 
COLS 


ADDLP 


ES 
ED 


(MTX) ,Y 


(MTX) ,Y 
(MTX) ,Y 
ED 
SHFTLP 
(MTX) ,Y 
ROWS 


RI 
NEWROW 


COLS 


CI 
NEWCOL 


DATA MOVES 


:Skip out when RI loops done. 
:Add COLS once for 

:every RI. 

:Loop to exit on RI done test. 


:New source index. Next dest 
7:1 up on last positioned byte. 
:Index and get source element 
:save in X for destination 
:store after rest shifted up. 


:Shift up block below 
:current source by one byte, 
:ED to ES-1 going into 

:ED+1 to ES, leaving ED free 
:to receive current element. 
:Repeat until destination 
sbyte is indexed by Y. 


:Position current element 
zat destination. 


:Test for complete positioning 
:of currently indexed column 
:when rows indexed 1 to last 
:have been processed. 


:Test for end of transposition 
:when columns indexed 0 to 
:last - 1 have been processed 
:(last column automatically 
:okay when others okay). 


:Restore 

:registers 

:and 

iflags 

used in 

TRANS. 

:Exit, matrix transposed. 


06 
M3 


M7 
M8 


M0 


M0 


M0 


M8 


F5 


M0 


M2 


M6 
C9 


M3 


M4 
B2 


hi 
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Reducing Errors 





There is an ancient (pre-decimalisation) story of how the urgent 
message, ‘Send reinforcements, we are going to advance.’ was 
passed by word of mouth along the trenches. When the message 
reached Battalion HQ, the top brass were somewhat taken aback 
by the request, ‘Lend three and fourpence, we are going to a 
dance.’ 


I don’t know if the advance (of three shillings and four pennies) 
was made but the story does illustrate vividly the corruption that 
can occur to ‘soft’ data. 


Errors in data may have several causes. The most common ones are a 
slip of the finger during input and corruption during transmission or 
magnetic storage. Such errors may be no more than a time 
consuming frustration to the computer hobbyist but in the business 
world they can be very costly. 


Many methods have been developed to ensure the validity and 
integrity of important data. Some merely indicate that an error has 
occurred whilst others can identify the errors and correct them. 
None can guarantee 100% accuracy. 


CHECKSUMS 


The checksum technique is mainly used to detect mistakes made 
when typing in numbers. An extra digit (or even a letter) called the 
check digit is calculated from the sum of all the digits in the number 
and appended to it. Thereafter, whenever the number is typed in, 
the checksum digit is recalculated and compared to the appended 
check digit. If the two match then the actual number is assumed to 
be valid. Any disparity means that an error has been made and the 
number must be re-entered. 


Simply adding the digits in the number string is not the best method 
of calculating the check digit as it allows too many errors to pass 
unnoticed. For example, 402173Q with 4 and 2 transposed on input 
would be erroneously accepted as 204173Q. The digits of both 
numbers sum to 17: 


To guard against transposition errors, most checksums assign a 
unique weighting to each digit position. Lance A. Leventhal in his 
book “6502 Assembly Language Programming’ describes a method 
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known as ‘Aligned 1, 3, 7 Mod 10’ and sets it as an exercise. 
Calculating the checksums of 402173 and 204173 gives the following 
results. 


Checksum : 1*4 + 340 + 7*2 + 141 + 347 + 743 = 61, 
Checksum digit : (61 Mod 10) = 1. 
Checksum : 1*2 + 3x0 + 7*#4 + 141 + 3x7 + 743 = 73. 
Checksum digit : (73 Mod 10) = 3. 


Transposition of 4 and 2 when inputting 4021731 (the number 
402173 with appended check digit 1) would be detected but the 
method is not foolproof. Other errors, such as transposition of 0 and 
7 in the same number, would not be found. There are checksum 
weighting sequences that are more successful in trapping errors. 


CHKSUM is my solution to the ‘Aligned 1, 3, 7 Mod 10’ problem set 
by Leventhal. It appeared in Sub Set as CADIDS in a double- 
datasheet containing both Z-80 and 6502 coding of the same method. 


:J0B To calculate the checksum digit of a BCD string 
: using the ‘Aligned 1, 3, 7 Mod 10' method. 
:ACTION Clear checksum, (CHK). Clear weighting, (WGT). 
: FOR each byte: 


C FOR each digit: 
C IF WGT = 7 THEN: EC Clear WGT. J 
WGT = WGT * 2 + 1 
FOR WGT: C CHK = CHK + digit J] ] J 
CHK = CHK Mod 10. 
> CPU 6502 
sHARDWARE Memory containing BCD string. 
>SOFTWARE None. 
: INPUT M0,1 addresses BCD string first (hi-order) byte. 
: M2 = Number of bytes in string. 
(A leading or trailing zero must be used for a 
: string with an odd number of digits.) 
:OUTPUT MO,1 Addresses string + 1. M2 = Checksum digit. 
: Y= 0. X = 4. A and P are unchanged. 
sERRORS No check made for valid BCD digits in number. 
:REG USE X Y 
:STACK USE 5 
RAM USE MO M1 M2 


: LENGTH 77 

CYCLES 134 + average of 83 per digit. 
:CLASS 2 -discreet *interruptable *promable 
rokkaKA xreentrant *relocatable -robust 


NUM 
CHK 


CNT 
WGT 
DGT 


CHKSUM 


PUSHLP 


NXTBYT 
NXTDIG 


NEWDIG 


NEWWGT 


ADDWD 


NBTEST 


PHP 
LDX 
PHA 
LDA 
DEX 
BNE 


STA 
TXA 
TAY 
STA 
SED 


LDX 
STA 


LDA 
CPX 
BNE 
LSR 


LSR- 


LSR 
LSR 
STA 


LDA 
CMP 
BNE 
TYA 
SEC 
ROL 
STA 


TAY 
LDA 
CLC 
ADC 
DEY 
BNE 


DEX 
BNE 


INC 
BNE 


DEC 
BNE 


MO 
M2 


M3 


M4 
M5 


#4 


CHK-1,X 


PUSHLP 


CNT 


WGT 


A 
DGT 
WGT 


#7 
NEWWGT 


WGT 


CHK 
DGT 


ADDWD 


NXTDIG 


NUM 
NBTEST 
NUM+1 
CNT 
NXTBYT 


:Set byte count = 
:Initially clear checkdigit, 
:zeroise NUM index and 
:initialise weighting to 0. 
:Ensure decimal arithmetic. 
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:2-byte stored address of number. 
:Number byte-length (input), 
scheck digit (output). 

>For number byte count. 

>For computed digit weighting. 
-For storing each successive digit. 


>Save flags and set index for 
:loop which saves 4 bytes... 

:Save A first push, then 

7M5, M4 and M3. 

:Leave M2 (byte length) in A 

zand X = Q. 


byte length. 


:Count 2 digits per byte. 
:Store partial check digit. 


:Get currently addressed byte 
sand test if currently on low 
:order digit (skip it's okay) 
else on high order digit 
:which has to be moved 

:down to low order 

:before it can be added. 
:Store- digit to page 0 for adds. 


Get last weighting and 
test if at top limit, 
>skipping if not, 
:else reset to 0 (Y = 
>Rotate left with Carry set so 
weighting = 
>Store new weighting. 


0). 


weighting * 2 +, 1. 


Weighting is add loop count. 
:Get partial check digit and, 
zensure no carry to mess it up, 
:add digit * weighting by 
:repeated decimal addition. 
sLeave Y = 


Q on exit from loop. 


:Repeat for 
:two digits per byte. 


Address next byte of 
:number string, take care of 
sany carry to hi-byte. 
>Repeat for all bytes in 
number string. 


04 
M1 
FA 


M3 


M4 


02 


M2 


M0 


04 


M5 
M4 


01 


M4 


M2 
M5 


FA 
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AND #S0F ‘Mod 10. A = check digit. 29 OF 
PULLLP STA CHK,X sLoop, storing check digit first 95 M2 
PLA ‘then, pulling and restoring 68 
INX 7M3, M4 and M5, then E8 
CPX #4 finally restoring A from stack EO 04 
BNE PULLLP :on last iteration. DO F8 
PLP :Restore flags 28 
RTS sand exit, check digit found. 60 
PARITY CODING 


ECAL and EFIX by John Kerr use a form of parity coding based on 
the position of each bit in a block of data. Hence, they can be used 
not only to detect a bit inversion that may have occurred in the block 
but also to locate and correct it. 


Before storage or transmission, an error correction byte (ecb) is 
calculated for every 31 bytes of data and appended to make a neat 
32-byte block. On retrieval or receipt of the data, a new ecb is 
calculated and exclusive-ORed with the appended ecb. If all bits in 
the resultant ‘correction code’ are zero then the data can be assumed, 
valid. If not then any value above 7 in the correction code gives the 
position of a single bit in the data block. EF IX assumes this bit has 
been inverted and corrects the data by re-inverting the bit. 


The method can only cope with one corrupt bit in each data block. 
More than one inversion could together have no effect on the parity 
coding of the ecb or even fool EF 1X into actually changing the state 
of a valid bit. And valid data could be ‘corrected’ by EF 1X were a bit 
inversion to occur in the ecb itself. However, according to John, 
ECAL and EFIX can correct about 95% of errors in a system where 
the probability of bit error is less than 0.4%. 


:J0B To calculate a single byte parity code, capable 
: of being used to detect and correct a single bit 
: error in a1 to 31 byte data block. 
sACTION IF block byte length > 0 AND < 32 THEN: 
: [ Parity mask bit-count = bytes * 8 + 7. 
Clear error correction byte (ecb). 
Index first data byte. 
WHILE bit-count > 7: 
{ Get current data byte. 
FOR bits 7 to 0 of data byte: 
C IF current bit = 1 THEN: 
C ecb = ecb EOR bit-count. ] 
Bit-count = bit-count - 1. ] 
Index next data byte. ] ] 
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Memory containing data block. 


None. 


M0,1 addresses first byte of the data block. 
Y = no. of bytes in data block (max. 31). 
, Y and M0,1 are unchanged. | 


kdiscreet 
*reentrant 


> CPU 
> HARDWARE 
> SOFTWARE 
INPUT 
OUTPUT 
>ERRORS 
>REG USE 
>STACK USE 
RAM USE 
> LENGTH 
CYCLES 
CLASS 1 
LEKKKKE 
DATA = 
BCNT = 
BYTE = 
ECAL CPY 
BCS 
PHA 
LDA 
PHA 
TYA 
SEC 
BEQ 
ASL 
ASL 
ASL 
ORA 
STA 
LDA 
PHA 
LDX 
LDY 
BYTELP LDA 
STA 
TXA 
LDX 
BITLP ASL 
BCC 
EOR 
NXTBIT DEC 
DEX 
BNE 


#32 
ENDRTS 


BCNT 


ENDPLL 


>_> > 


> 


BCNT 
BYTE 


#0 
#0 


(DATA) ,Y 
BYTE 


#8 
BYTE 
NXTBIT 
BCNT 
BCNT 


BITLP 


*interruptable *promable 
*relocatable *robust 


sAbort if data block length 
71S greater than 31. 


:Else, save A 

zand M2 

:for use in ECAL. 

:Get block length in A, testing 
:if zero, set abort flag and 
:abort if length is zero. 


:Else okay, so multiply by 8 
:to convert byte length to 
sbit length and add 7 to form 
sparity mask bit-count in M2 
:(0 means bit 0 of block + 1). 


:Save M3 for use as store for 
:process of current byte. 
:Clear error correction byte. 
Index data from first byte. 


:Get currently indexed byte to 
:page zero for processing. 

>Move ecb to A and set up count 
:of 8 in X for bits in this byte. 


-Move next data bit to Carry. 
"ecb EOR O" if bit reset, else 
:"ecb EOR bit-count". 

:Prepare bit-count for next bit. 
:Mark off one bit processed 

:and repeat for all 8 this byte. 


1: aborted (Y = 0 or Y > 31). X unchanged. 
QO: X contains error correction byte (ecb). 


:2-byte stored address of data block. 
>For storing parity mask bit-count. 
>For storing current data byte. 


FO 2B 


09 07 
85 M2 


A5 M3 


A2 00 
AQ 00 


Bi M0 
85 M3 


A2 08 
06 M3 
90 02 
45 M2 
C6 M2 


DO F5 
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E 


E 


INY 
LDX 
CPX 
TAX 
BCS 


PLA 
STA 
NDPLL PLA 
STA 
PLA 
NDRTS RTS 


:ACTION 


:Index next data byte. C8 
BCNT :Test for block end, resetting Aé M2 
#8 -Carry when bit-count = 7. EO 08 
-Move ecb to X and repeat AA 
BYTELP :for all data block. BO E6 
“Restore M3 68 
BYTE :from stack. 85 M3 
-Restore M2 and A either 68 
BCNT ‘after ecb found (C = Q) or 85 M2 
son zero length (C = 1). 68 
sExit ECAL. 60 


To examine a 1 to 31 byte data block with 
appended error correction byte, correcting any 
single bit inversion indicated. 

IF block byte length > 0 AND < 32 THEN: 

[ Calculate new error correction byte (ecb). 
Correction code = new ecb EOR appended ecb. 
IF set bit(s) in correction code THEN: 

C IF correction code addresses data bit THEN: 
C Correct bit error by re-inversion. ] ] ] 


2 CPU 
>HARDWARE 
SOFTWARE 


:QUTPUT 


ERRORS 


:REG USE 
>STACK USE 
RAM USE 

> LENGTH 
:CYCLES 


=CLASS 2 


6502 
Memory containing data block. 
"ECAL" - routine to calculate ecb. 


MO,1 addresses first byte of the data block. 
Y = no. of data bytes in block (max. 31). 
(Y indexes appended error correction byte). 
MO,1 is unchanged. 
C = 1: aborted (Y = 0 or Y > 31). 
C = 0: Data assumed valid. 

Error not found: Y is unchanged. 

Error found: Single bit corrected. 

Y indexes corrected byte. 

More than one real bit-error, or a bit error in 
the error correction byte can result in an 
uncorrupted bit being inverted. Several errors 
in the data can cancel each other out. 
P Y 
7 (including JSR ECAL) 


MO M1 

53 

Average 192 + 175 per data byte. 
xdiscreet *interruptable *promable 
ereentrant *relocatable -robust 
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DATA = M0 :2-byte stored data block address. 
ERRI = M2 :Store for error correction code. 
EFIX PHA :Save A and X 48 
TXA >for use in EFIX. 8A 
PHA : 48 
JSR ECAL >Get new ecb for block but 20 lo hi 
BCS ENDPLR zabort (C = 1) if ECAL aborts. BO 29 
LDA ERRI :Save page zero storage A5 M2 
PHA :for error correction. 48 
TXA :Move new ecb to A and test 8A 
EOR (DATA),Y against appended ecb, if 51 M0 
CMP #8 :difference < 8 then no C9 08 
BCC ENDPLZ error found in data block. 90 1C 
TAX :Any set bits in EOR'd ecb AA 
LSR A :bits 7 to 3 are index to 4A 
LSR A sposition of corrupt byte in 4A 
LSR A :data block, so convert to 4A 
STA ERRI sbyte index, store in page 0 85 M2 
CPY ERRI :and test if position is in C4 M2 
BCC ENDPLZ :data block, exit if outside. 90 12 
TYA :In block, so convert from 98 
SBC ERRI >index-from-block-end to E5 M2 
TAY :index-from-start, in Y. A8 
TXA :Bits 2,1,0 of EOR'd ecb 8A 
AND #7 cindex corrupt bit in byte 29 07 
TAX 7SO isolate them, clear A and AA 
LDA #0 susing set C from "SBC ERRI", A9 00 
SETBLP ROL A :Shift one set bit into A 2A 
DEX :in same position as the CA 
BPL SETBLP sinverted bit in corrupt byte. 10 FC 
EOR (DATA),Y :Use set bit to re-invert 51 0 
STA (DATA),Y scorrupt bit, return to block. 91 MO 
ENDPLZ PLA -Restore page zero byte 68 
STA ERRI :used for error code. 85 M2 
ENDPLR PLA sRestore X and A 68 
TAX then exit EFIX with AA 
PLA :C = 1 if aborted or 68 
RTS :C = 0 if block okay. 60 


9] 





10 






Legible 
— Listings 


Most computers come supplied with efficient software that 
includes various output formats for the different types of data and 
programs that you are likely to use. Sometimes, however, the 
printout or screen display doesn’t carry all the information you 
might require. Or the information is there but difficult to read. 


This chapter contains two routines which ‘patch in’ to the 
computer’s output routine and edit the information being sent to it. 
The first of these routines is written to be generally applicable and 
requires only that your computer uses ASCII codes. The second, 
though, is written specifically for the BBC computer as an assembler 
program embedded in and set up by BBC BASIC. Don’t skip it if 
you don’t have a BBC computer, the method it uses can easily be 
adapted to customise all formatted outputs‘on other computers. 


IN CONTROL 


CTRPRT by Tim Herklots intercepts ASCII data being sent to an 
output routine and converts all control codes $00 to $1F into their 2- 
or 3-letter standard abbreviations. The abbreviations are bracketed 
by LESS-THAN and GREATER-THAN symbols < and >. 


One major use for CTRPRT is to extend the capabilities of code dump 
routines. Some of these print out the ASCII characters alongside the 
hexadecimal digits in case text data and not machine code is coming 
through. ASCII control codes (and values above $7E, for that 
matter) are usually printed either as spaces or periods — not very 
useful if your code is doing a lot of screen movements! 


Here is an example of the sort of effect CTRPRT has on a section of 
code. It is a data string which clears the screen and then prints out 
two messages at carefully positioned screen locations. 


NORMAL HEX DUMP 


OC OA OA 0A 11 11 4D 45 ~— ee ME 
53 53 41 47 45 20 31 OD SSAGE 1. 
OA OA OA 6D 65 73 73 61 2 .messa 
67 65 20 32 ge 2 
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HEX DUMP THROUGH CTRPRT 


<FF><LF><LF><LF><DC1><DC1>ME 
SSAGE 1<CR> 


OC OA OA OA 11 11 4D 45 
93.53 41 47 45 20 31 OD 


OA OA OA 6D 65 73 73 61 <LF><LF><LF>messa 
67 65 20 32 ge 2 
= CTRPRT Control character name print. 
:J0B To intercept an ASCII control code destined for 
a print routine, converting it to its three 
: letter abbreviation enclosed in brackets. 
sACTION IF character is a control code, THEN: 
: C Use code * 3 as index to abbreviation table. 
Copy abbreviation to stack. Put "<" on stack. 
FOR count of 4: 
C Pull byte. IF NOT "*" THEN: C Print. J. J] 
Set character = ">", ] 
Exit to Print routine. 
CPU 6502 
> HARDWARE None. 
:SOFTWARE "WRCHAR" - subroutine to print ASCII char. in A. 
INPUT A contains ASCII character or control code. 
> OUTPUT If input is a character ($20+), A is unchanged. 
: If input is a control code, output A = ">", 
>ERRORS None. 
:REG USE A P 
:STACK USE 4 
>RAM USE MO 
> LENGTH 146 (Code: 50. Data: 96). 
CYCLES Not given. 
CLASS 2 -discreet *interruptable *promable 
pokkk~k *reentrant -relocatable *robust 
CTRPRT CMP #32 :Pass straight through if C9 20 
BCS CPEND snot a control code. BO 2B 
STA MO :Else code table offset 85 M0 
ASL A :is 3 * code number. OA 
ADC MO : 65 M0 
STX MO :Save X and move offset 86 MO 
TAX sto X as code index. AA 
LDA CTAB+2,X :Get last letter BD lo hi 
PHA :to stack. 48 
LDA CTAB+1,X :Get middle letter BD lo hi 
PHA :to stack. 48 
LDA CTAB,X :Get first letter BD lo hi 
PHA sto stack. 48 
LDA #$3C :Get LESS-THAN symbol AY 3C 
PHA :to stack. 48 
LDX M0 :Restore X. A6é M0 
LDA #4 :Set count of stacked A9 04 
STA MO zletters in MO. 85 m0 
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CPLP 


PLA :Get character off stack 68 
CMP #42 sbut if "*" space filler C9 2A 
BEQ CPLTST :then miss printing it. FO 03 
JSR WRCHAR :Go print character. 20 lo hi 
CPLTST DEC MO :Repeat for four stacked C6 MO 
BNE CPLP scharacters. DO F4 
LDA #$3E :End with GREATER-THAN. A9 3E 
CPEND JMP WRCHAR :Exit to print last char. 4C lo hi 
CTAB .BYTE 78,85,76 :NUL Null 4E 55 4C 
-BYTE 83,79,72 :SOH Start of Heading 53 4F 48 
~BYTE 83,84,88 :STX Start Text 53 54 58 
~-BYTE 69,84,88 :ETX End Text 45 54 58 
~BYTE 69,79,84 :EOT End of Transmission 45 4F 54 
~BYTE 69,78,81 :ENQ Enquiry 45 4— 51 
~BYTE 65,67,75 :ACK Acknowledge 41 43 4B 
-BYTE 66,69,76 :BEL Bell 42 45 4C 
~BYTE 66,83,42 :BS* Backpace 42 53 2A 
~-BYTE 72,84,42 :HT* Horizontal Tab 48 54 2A 
~BYTE 76,70,42 :LF* Line Feed 4C 46 2A 
~BYTE 86,84,42 :VT* Vertical Tab 56 54 2A 
~-BYTE 70,70,42 :FF* Form Feed 46 46 2A 
~BYTE 67,82,42 :CR* Carriage Return 43 52 2A 
~BYTE 83,79,42 :SO* Shift Out 53 4F 2A 
~BYTE 83,73,42 :SI* Shift In 53 49 2A 
-BYTE 68,76,69 :DLE Data Link Escape 44 4C 45 
-BYTE 68,67,49 :DC1 Direct Control 1 44 43 31 
-BYTE 68,67,50 :DC2 Direct Control 2 44 43 32 
~BYTE 68,67,51 :DC3 Direct Control 3 44 43 33 
~BYTE 68,67,52 :DC4 Direct Control 4 44 43 34 
~BYTE 78,65,75 = :NAK Negative Acknowledge 4E 41 4B 
-BYTE 83,89,78 :SYN Synchronous Idle 53 59 4E 
~BYTE 69,84,66 :ETB End Transmission Block 45 54 42 
~BYTE 67,65,78 :CAN Cancel 43 41 4E 
~BYTE 69,77,42 :EM* End of Medium 45 4D 2A 
-BYTE 83,85,66 :SUB Substitute 53 55 42 
-BYTE 69,83,67 :ESC Escape 45 53 43 
~-BYTE 70,83,42 :FS* Form Separator 46 53 2A 
~BYTE 71,83,42 :GS* Group Separator 47 53 2A 
~BYTE 82,83,42 :RS* Record Separator 52 53 2A 
~BYTE 85,83,42 :US* Unit Separator 55 53 2A 
BBC ASSEMBLER LISTING 


BBC BASIC is a pretty powerful tool, allowing you to embed 
assembly language programs which can use sasic variables. The 
assembler also supports multi-statement lines as in BASIC. 


The only problem with this wealth of facilities is that of readability. 
The Beeb doesn’t go in for standard asembler formatting and, rather 
than helping you to find your way through a routine, comments tend 
to obscure — as in Listing 1 of LSTFMT. 
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1OREM - "LSTFMT" ~- FORMATTED ASSEMBLER LISTING 
2OHA=PAGE/256-1:L%2=8&15 

30*FX 6,10 

40*KEY 10 IN OLDIM 

SO*KEYO" 27&20E=8A4: ?&20F=8E0IM" 
6O*KEY1"2&20E=L%: 27&20F=H%: !&70=01M" 

7OREM - VECTOR PRINT ROUTINE WRCH 
SOOSWRCH=?&20E+7820F*256 

9OREM - SYSTEM READ OF CURSOR POSITION. 
1000SBYTE=&FFF4:csrpos=&86 

110REM - SET UP PAGE ZERO USE 

120asmf l=&70:cmtfl=&871:lblfl=&72:litfl=&73:tabst=&74:tempa 


=&75 


130REM - NAME FORMATTING CHARACTERS 
140asmin=&5B:asmout=&5D:colon=&3A:quotes=&22:carret=&0D: ln 


feed=&0A 


1SO0space=&20: Label=&2E:commnt=&5C 

160REM - FIELD TAB SETTINGS 

170coltab=4: lbltab=6:mnmtab=14:cmntab=28: lwidth=59 
180FOR I=0 TO 2 STEP 2 

19OPL=HA*256+L% 

200COPT I 

210STA tempA \Save value in zero page 
220PHP:TXA:PHA:TYA:PHA \Save registers and 

230LDX tempA \get input value in X for tests. 
240TXA:SEC:SBC #carret:BNE gqtest \Test for line end. 
250STA cmtfl:STA LbLfl:STA Litfl \Clear comment, label & lL 


iteral flags. 


260BEQ bexit \Hop to exit. 
270.qtest CPX #quotes:BNE lftest \Test for literal start/e 


280LDA Litfl:EOR #1:STA Litfl \Toggle literal flag on/off. 
290DEX:BNE width \Hop to line-end test. 

300.lftest LDA Litfl:BNE width \No formatting if flag on. 
310CPX #colon:BNE asstst \Test statement as new Line. 
320STA cmtfl:STA Lblfl \Clear flags 

330JSR lfeed \Go to new line. 

340LDY #coltab:JSR tabout \Tab to colon position. 
S50TXA:JSR OSWRCH \Print colon, 

360LDA #space:STA tempA:BNE exit \then a space on exit. 
370.asstst CPX #asmin:BNE asetst \Test for assembler start 


S80INC asmfl:BNE width \Flag on and go to line-end test. 


390.asetst CPX #asmout:BNE aftst \Test for assembler end. 
4OODEC asmfl:BEQ width \Flag off and go to line-end test. 
410.aftst LDA asmfl:BEQ width \BASIC if flag off - no form 


420LDA cmtfl:BNE width \Inside a comment - carry on. 
430CPX #commnt:BNE lbftst \Test for comment start. 
440INC cmtfl:LDA #cmntab:JSR pos \Comment flag on and test 


print position. 


450BCC exit:BEQ exit: TAY:JSR tabout \Tab to comment field 


if needed. 


460.bexit BEQ exit \Also "stepping stone to ‘exit’. 

470. lbftst LDA Lblfl:BEQ@ Lbtest \Test if inside a label. 
480CPX #space:BNE exit \Print if not end of label, 
490DEC LbIfl:BEQ exit \else flag off and print space. 
500.lbtest CPX #label:BNE lintst \Test for label start. 


LEGIBLE LISTINGS 


SITOINC LbLfl:BNE exit \Flag on and exit to print "." 
520.lintst CPX #space:BEQ exit \Print space on exit. 
S30CPX #&350:BCC mnmtst:CPX #&3A:BCC exit \Digits ok - prob 
ably Line number. 
540.mnmtst LDA #mnmtab:JSR pos:BCC exit:BEQ exit \Exit if 
in mnemonic, 
SSOTAY: JSR tabout:BEQ exit \else tab to mnemonic field. 
560.width LDA #lwidth:JSR pos:BCS exit \Okay if not at lin 
e-end 
570JSR lfeed \else next Line and 
S8O0LDA cmtfl:BEQ lblpos \skip if not in a comment 
S9OLDY #cmntab:JSR tabout \else tab up to comment position 
and 
600LDA #commnt:JSR OSWRCH:INY:BNE exit \write new comment 
symbol. 
610.lblpos LDY #lbltab:JSR tabout \Else tab to label field 


620.exit PLA: TAY:PLA:TAX:PLP \Restore registers and 

630LDA tempA:JMP OSWRCH \exit through character print rout 
ine. 

640.tabout LDA #space:JSR OSWRCH:DEY:BNE tabout:RTS 

650.lfeed LDA #carret:JSR OSWRCH \Using OSNEWL would send 
CHR$ 13 and 10 | 

660LDA #lnfeed:JSR OSWRCH \through LSTFMT and overwrite 

670RTS \the character stored in tempA. 

680.pos PHA:LDA #csrpos:JSR OSBYTE \Read text cursor posit 
ion. 

690STX-tabst:LDX tempA:PLA \store it and recover registers 

7OOSEC:SBC tabst:RTS \get position difference and return. 

710) 

720NEXT 


LSTFMT is a patch by Sverrir Karlsson which intercepts characters 
on their way to the BBC print routine WRCH. It formats the listing of 
both normal Basic and assembler lines to one statement or 
instruction per printed line. Assembly language is also formatted to 
normal line number, label, mnemonic, operand and comment fields. 
Listing 2 is the result of LSTFMT acting on itself. 


10 REM - "LSTFMT" - FORMATTED ASSEMBLER LISTING 
20 H*=PAGE/ 256-1 
> L%Z=8&15 
30 *FX 6,10 
40 *KEY 10 IN OLDIM 
50 *KEYO"2&20E=8A4: 2820F=8E01 M" 
60 *KEY1"?&20E=L%Z:2?&20F=H%: '&70=01M" 
70 REM - VECTOR PRINT ROUTINE WRCH 
80 OSWRCH=?&20E+2&20F*256 
90 REM - SYSTEM READ OF CURSOR POSITION 
100 OSBYTE=&FFF4 
: csrpos=&86 
110 REM - SET UP PAGE ZERO USE 
120 asmfl=&70 
> cmtfl=&71 
: Lblfl=&72 
: Litfl=&73 
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130 
140 


150 


160 
170 


180 
190 
200 
210 
220 


230 


240 
250 


260 
270 


280 


290 
300 
310 


320 


: tabst=874 
tempA=&75 
REM - NAME FORMATTING CHARACTERS 
asmin=&5B 
> asmout=&5D 
: colon=&3A 
: quotes=&22 
: carret=&0D 
Lnfeed=&0A 
space=&20 
Label=&2E 
commnt=&5C 
REM - FIELD TAB SETTINGS 
coltab=4 
: Lbltab=6 
> mnmtab=14 
: cmntab=28 
Lwidth=59 
FOR I1=0 TO 2 STEP 2 
PL=HAR2564+LZ 
C OPT I 
STA tempA \Save value in zero page 
PHP 
TXA 
PHA 
TYA 
PHA \Save registers and 
LDX tempA \get input value in X for tests. 
\ 
TXA 
SEC 
SBC.#carret 
BNE qtest \Test for line end. 
STA cmtfl 
STA Lblfl 
STA Litfl \Clear comment, label & literal 
\flags. 
BEQ bexit \Hop to exit. 
~qtest CPX #quotes 
BNE lftest \Test for literal start/end. 
LDA Litfl 
EOR #1 
STA lLitfl \Toggle literal flag on/off. 
DEX 
BNE width \Hop to lLine-end test. 
.lftest LDA Litfl 
BNE width \No formatting if flag on. 
CPX #colon 
BNE asstst \Test statement as new line. 
STA cmtfl 
STA Lblfl \Clear flags 
JSR lfeed \Go to new line. 


330 
340 


350 


360 


LDY #coltab 

JSR tabout \Tab to colon position. 
TXA 

JSR OSWRCH \Print colon, 

LDA #space 

STA tempA 


370 


380 


390 


400 


410 
420 
430 


440 


450 


460 
470 
480 
490 
500 
510 
520 


530 


540 
550 
560 


570 
580 


-asstst 


-asetst 


.aftst 


.bexit 


-lbftst 


.lbtest 


.lintst 


omnmtst 


width 


BNE 
CPX 
BNE 
INC 
BNE 


CPX 
BNE 
DEC 
BEQ 


LDA 
BEQ 
LDA 
BNE 
CPX 
BNE 
INC 
LDA 
JSR 


BCC 
BEQ 
TAY 
JSR 


BEQ 


LDA 
BEQ 
CPX 
BNE 
DEC 
BEQ 
CPX 
BNE 
INC 
BNE 
CPX 
BEQ 
CPX 
BCC 
CPX 
BCC 


LDA 
JSR 
BCC 
BEQ 
TAY 
JSR 
BEQ 
LDA 
JSR 
BCS 
JSR 
LDA 
BEQ 


exit 
#asmin 
asetst 
asmfl 
width 


#asmout 


aftst 
asmf | 
width 


asmf l 
width 
cmtfl 
width 


#commnt 


lbftst 
cmtf l 


#cmntab 


pos 


exit 
exit 


tabout 
exit 


Lblfl 
Lbtest 
#space 
exit 
LoL fl 
exit 
#label 
Lintst 
Lbl fl 
exit 
#space 
exit 
#8&30 
mnmtst 
#&3A 
exit 


#mnmtab 


pos 
exit 
exit 


tabout 
exit 


#lwidth 


pos 
exit 
Lfeed 
cmtfl 
lblpos 


LEGIBLE LISTINGS 


\then a space on exit. 
\Test for assembler start. 


\Flag on and go to line-end test 
\. 


\Test for assembler end. 

\Flag off and go to line-end tes 
\t. 

\BASIC if flag off - no format. 
\Inside a comment - carry on. 
\Test for comment start. 


\Comment flag on and test print 
\position. 


\Tab to comment field if needed. 
\Ats "stepping stone to ‘exit’. 
\Test if inside a label. 

\Print if not end of label, 
\else flag off and print space. 
\Test for label start. 


\Flag on and exit to print "." 


\Print space on exit. 


\Digits ok - probably Line numbe 
\r. 


\Exit if in mnemonic, 


tab to mnemonic field. 


\else 


\O0kay 
\else 


if not at line-end 
next line and 


\skip if not in a comment 
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590 LDY #cmntab 
: JSR tabout \else tab up to comment position 
\ and 
600 LDA #commnt 
: JSR OSWRCH 
INY 
BNE exit \write new comment symbol. 


610 .lblpos LDY #lbltab 
: JSR tabout \Else tab to label field. 
620 .exit PLA 
: TAY 
PLA 
TAX 
: PLP \Restore registers and 
630 LDA tempA 
: JMP OSWRCH \exit through character print ro 
\utine. 
640 .tabout LDA #space 
: JSR OSWRCH 
DEY 
BNE tabout 
: RTS 
650 .lfeed LDA #carret 
: JSR OSWRCH \Using OSNEWL would send CHR$ 13 


\ and 10 
660 LDA #lnfeed 
: JSR OSWRCH \through LSTFMT and overwrite 
670 RTS \the character stored in tempA. 


680 .pos PHA 
: LDA #csrpos 
JSR OSBYTE \Read text cursor position. 


690 STX tabst 

: LDX tempA 

: PLA \store it and recover registers 
700 SEC 

: SBC tabst 

RTS \get position difference and ret 
\urn. 

710 J 
720 NEXT 


LSTFMT should be typed in as it appears in Listing 1 (without spaces 
after line numbers — the BBC LIST option command will insert 
them later) and RUN. After running, type LIST01, press FUNCTION 
KEY 1 and type LIST. The program should appear as in Listing 2 
both on screen and, if you have enabled your printer (cTRL B), on 


paper. 
Pressing FUNCTION KEY 0 will cause listing in the normal List Option 


1 and the BREAK key (FUNCTION KEY 10) will cause a reversion to 
unspaced listing. 


The string assigned in * KEY] loads the address of LSTFMT into the 
OSWRCH vector causing print subroutine calls to be directed to 
LSTFMT instead of WRCH. The string in *KEYO replaces the address 
of WRCH in the vector, disabling LSTFMT. *FX 6,10 stops line feed 
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(SOA, ‘LF’) being sent to the printer — remove it if your printer 
doesn’t perform line feed automatically when it receives a carriage 
return. 


LSTFMT assembles the machine code at the page below the start of 
BASIC program space. In a cassette system this is &E00O (to use the 
BBC BASIC hex symbol) but &1900 in a DFS disk system. To be 
certain that the machine code will not overwrite anything — or be 
overwritten — you could clear a 256-byte page for it below the Basic 
program space by typing PAGE = &FOO (or PAGE = &1A00 for a disk 
system). 


One final word of warning: don’t attempt to edit your programs in 
the formatted mode; LSTFMT is strictly a one way process. 
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Dot Graphics 1} 


The routines in this chapter are all different methods of 
manipulating graphics information of a very particular type and 
were sent to Sub Set in response to the following challenge. 





ELEGANT SOLUTION INVITED 


In this problem, the target byte holds information which determines 
how 4 graphics dots are to be displayed as either background (not 
showing) or as one of three colours. This dot information is held in 
the least significant 2 bits of a byte, which represent the four states 
00, 01, 10 and 11 with the 6 most significant bits of the byte reset. 
The information for the dots, which are numbered 0 to 3, is arranged 
in the target byte thus: 


bit 765 43 21 0 
dot 0 123 0412 3 


Dot 0 information is in bits 7 and 3 of the byte, dot 1 in bits 6 and 2, 
dot 2 in bits 5 and 1 and dot 3 in bits 4 and 0. 


Given a 2-byte address of the target byte, a 1-byte dot number 
(binary 0 to 3) and 1 byte of dot information (wherever you like it in 
either registers, memory, or aS parameters embedded in the code 
following the subroutine call), we want the most elegant routine to 
place the dot information, according to dot number, in the target 
byte, without disturbing any of the information relating to the other 
3 dots. 


Those of you with the right hardware will recognise the format of the 
target byte as that used in display modes 1 and 5 of the BBC Micro. 
So solutions in 6502 code will be particularly relevant, though 
solutions in other code will be interesting. 


6502 ELEGANT SOLUTIONS 


There was a very large response to this challenge and only a selection 
of the shortest, quickest and most unusual contributions could be 
published. Some readers favoured fully processed solutions, which 
tend to be short but slow, whilst others went for the fast but 
byte-heavy table look-up methods. 


DOT11s by D. A. Stanford. At 32 bytes, it is the shortest solution to 
operate in a reasonable time. 
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DOT GRAPHICS 


To store graphic information to 2 specified bits 
of a graphic byte. 

Reformat dot information byte to match graphic 
byte format. 

Construct bit mask to graphic byte format. 

Mask out old dot information in graphic byte. 
Mask in new dot information to graphic byte. 


> CPU 
>HARDWARE 
> SOFTWARE 


OUTPUT 


> ERRORS 


REG USE 
>STACK USE 
RAM USE 
> LENGTH 
CYCLES 


>CLASS 2 


s-kkkkA 


D 
D 


D 
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OTNO 
OTINF 


OT1 LDA 
LSR 
ROR 
PHP 
LSR 
LSR 
LSR 
PLP 
ROR 
STA 


LDA 
LOY 
BEQ 


6502 
Graphics byte in (video) RAM. 
None. 


MO,1 addresses graphic byte where information is 
to be stored. 
M2 contains the dot number relating to pairs of 
bits in the graphic byte: 

dot number = 0: graphic byte bit 7 & bit 3. 


dot number = 1: graphic byte bit 6 & bit 2. 
dot number = 2: graphic byte bit 5 & bit 1. 
dot number = 3: graphic byte bit 4 & bit Q. 


M3 contains information to be entered in the 
addressed graphic byte bit pair: 

bit 1,M3 going to high nibble dot. 

bit 0,M3 going to low nibble dot. 
Information entered. MQ, M1 and M2 not changed., 
P, A, Y and M3 changed. 
No check that dot number and dot information are 
both in the range 0 to 3. 
PAY 
1 


MO to M3 


32 
Minimum: 54. Maximum: 95. 


-discreet *interruptable *promable 
xreentrant *relocatable -robust 


M0 -2-byte stored graphic byte address. 

M2 :Stored dot number. 

M3 :Stored dot information. 

DOTINF :Get inf as 40000 O0ab C=? A5 M3 

A :Split bits: Z0000 000a C=b 4A 

A : %b000 0000 C=a 6A 
-Save bit "a" in stacked C. 08 

A : %Z0b00 0000 c=0 GA 

A : Z00b0 0000 C=0 GA 

A : %Z000b 0000 c=0 GA 
:Restore bit "a" to C. 28 

A : %a000 b000 c=0 6A 

DOTINF -Store split information. A5 M3 

#377 :Mask to delete old information. A9 77 

DOTNO :Get dot no. as count, skip if A4& M2 

INSRT :mask & info in right place, FO 07 


DOT GRAPHICS 


SHIFT LSR DOTINF :Else shift info byte by 1 bit, 46 M3 


SEC :prepare to shift in set bit, 38 
ROR A | :shift delete-mask by 1 bit 6A 
DEY suntil info and mask are both 88 
BNE SHIFT zat right dot number. DO FI 
INSRT AND (CADDR),Y :Mask out old information and 31 M0 
ORA DOTINF zreplace by new information 05 M3 
STA (ADDR),Y :then re-store graphic byte. 91 M0 
RTS >Exit, information placed. 60 


D. A. Stanford also contributed a variation to D0T1 which replaces 
the 8-byte rotation sequence (instructions 2 to 9) by a single 3-byte 
instruction which reads the reformed dot information from a 4-byte 
table. DOT2 beats DOT1 by one byte and 15 clock cycles. 


DATA -BYTE 400000000 :Split of information 00. 00 


~BYTE 400001000 : 01, 08 
~BYTE 410000000 : 10. 80 
~-BYTE 410001000 : 11. 88 

DOT2 LDY DOTINF :Convert from index as A4& M3 
LDA DATA,Y >4000000ab to split form B9 lo hi 
STA DOTINF >4a000b000 in DOTINF. 85 M3 


:...Continue as for DOT1. 


But, faster and shorter though it is, D. A. Stanford didn’t think 
DOT2 as ‘elegant’ as DOT1. 


The notion of ‘elegance’ in a routine, code or merely the method 
used seemed to puzzle many contributors. Elegant solutions are 
those which are simple and don’t require a lot of forced manipula- 
tion. For example, the following code splits the 2 bits of information 
to separate nibbles (albeit to dot number 3 instead of dot number 0) 
without the mixture of rotates, shifts, pushes and pulls used in 
DOT1. Itis my 6502 implementation of a method contributed in 8085 
code. 


SPLIT3 LDA DOTINF :Get information as %000000ab, A5 M3 
CLC :Prepare for addition. 18 
ADC #$0E :Propogate bit 1 to bit 4 and 69 OE 
AND #$11 :mask out all unwanted bits. 29 11 
STA DOTINF :Store information as %000a000b. 85 M3 


SPLIT3 is 3 bytes shorter and 13 clock cycles faster than the method 
used in DOT1. Unfortunately, it takes another 4 bytes and 4 clock 
cycles to extend the method so that it converts the information to the 
form needed by DOT1. 
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DOT GRAPHICS 


SPLITO LDA DOTINF :Get information as %Z000000ab, A5 M3 
CLC :Prepare for addition. 18 
ADC #$0E :Propogate bit 1 to bit 4 and 69 OE 
AND #$11 :mask out all unwanted bits. 29 11 
ADC #$77 :Propogate: 4 to 7 and 0 to 3 and 69 77 
AND #$88 smask out all unwanted bits. 29 88 


STA DOTINF :Store information as 4a000b000. 85 M3 


TABLE LOOK-UP 


DOT3 by Oscar Burke uses look-up tables for both the mask to delete 
the existing information in the dot bits and the split information to 
be entered. With 19 bytes for the routine and another 20 bytes for 
the two tables, it is nearly 25% longer than DOT1 but quite a bit 
faster. Operating speed is probably more important than the length 
of a routine which deals only with one dot on a high resolution 
screen. Oscar has squeezed an extra ounce of speed out of DOT3 by 
having the index registers X and Y already loaded on entry, which 
could well be the case in a complete application. 


= DOTS Put graphics dot information 

:J0B To store graphic information to 2 specified bits 
: of a graphic byte. 

:ACTION Use dot number to index and get mask-out byte. 


Use dot number and dot information to index 
mask-in byte. 
Mask out old dot information in graphic byte. 
Mask in new dot information to graphic byte. 

2 CPU 6502 

sHARDWARE Graphics byte in (video) RAM. 

:SOFTWARE None. 

INPUT MO,1 contains the base address of a block or 

: . file of graphic bytes. 
Y indexes the particular graphic byte where 
information is to be stored. 
X contains the dot number relating to pairs of 
bits in the graphic byte: 

dot number = 0: graphic byte bit 7 & bit 3. 


dot number = 1: graphic byte bit 6 & bit 2. 
dot number = 2: graphic byte bit 5 & bit 1. 
dot number = 3: graphic byte bit 4 & bit 0. 


M3 contains information to be entered in the 
addressed graphic byte bit pair: 

bit 1,M3 going to high nibble dot. 

bit 0,43 going to low nibble dot. 


:OUTPUT Information entered. Y, MO, M1, M3 not changed. 
: P, A, X changed. 
:ERRORS No check that dot number and dot information are 


both in the range 0 to 3. 
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>REG USE PAX Y 


=STACK USE 1 

>RAM USE MO M1 M3 

>LENGTH 19 (plus 20 bytes for tables MASK and VAL). 

CYCLES 43 

CLASS 2 “discreet *interruptable *promable 

ro kkk-- *reentrant -relocatable -robust 

BASE = M0 :2-byte stored base address of block 

:of graphic bytes (indexed by Y). 

DOTINF = M3 :Stored dot information. 

DOT3 LDA MASK,X :Get mask for later mask out BD lo hi 
PHA :of old info, saved on stack. 48 
TXA :Convert dot number to index 8A 
ASL A ztable VAL, dot number indexes OA 
ASL A sblock of 4 bytes, dot info OA 
ORA DOTINF :indexes particular byte in 05 M3 
TAX sblock, index to X. AA 
PLA :Restore mask to A. 68 
AND (BASE),Y :Mask out old information and 31 M0 
ORA VAL,X zreplace by new information 1D lo hi 
STA (BASE),Y :then re-store graphic byte. 91 M0 
RTS :Exit, information placed. 60 

MASK ~BYTE 401110111 :Mask out for dot 0. 77 
~BYTE 410111011 :Mask out for dot 1. BB 
~BYTE 411011101 :Mask out for dot 2. DD 
~BYTE 411101110 :Mask out for dot 3. EE 


VAL -BYTE Z00000000 :Mask in, dot 0 info 00. 00 


~BYTE %£00001000 : 0 01 08 
~BYTE 410000000 : 0 10 80 
~BYTE 410001000 : 0 11 88 
~BYTE £00000000 : 1 00 00 
-~BYTE %00000100 : 1 01 04 
~BYTE %£01000000 : 1 10 40 
~BYTE 401000100 : 1 11. 44 
~BYTE %00000000 : 2 00. 00 
BYTE 400000010 : 2 01. 02 
-BYTE %00100000 : 2 10 20 
~BYTE %£00100010 : 2 11 22 
-BYTE %£00000000 : 3 00. 00 
BYTE 400000001 : 3 01. 01 
BYTE 400010000 : 3 10. 10 
~BYTE %00010001 : 3 11. 11 


PROCESSED LOOK-UP 


Not quite the shortest and not quite the fastest is this compromise 
between the use of look-up tables and computation by Glen Slade. 
DOT4 actually manipulates the mask to be both a mask-out byte to 
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delete existing information and a mask-in byte to pick up the new 
information only in the required dot positions. 


To store graphic information to 2 specified bits 
of a graphic byte. 

Use dot number to index and get position mask. 
Use inverted mask to mask out old information. 
Use dot information to index dot value table. 
Use mask to get dot specific value. 

Merge in new dot value to graphic byte. 


> CPU 
:HARDWARE 
:SOFTWARE 


OUTPUT 


>ERRORS 


>REG USE 
>STACK USE 
>RAM USE 
> LENGTH 
CYCLES 


ckkke = 


o 

oO 

— 

— 

= 

nm 
low WoW 


LDY 
LDA 
PHA 


EOR 
LDY 
AND 


6502 
Graphics byte in (video) RAM. 
None. 


MO0,1 addresses graphic byte where information is 
to be stored. 
M2 contains the dot number relating to pairs of 
bits in the graphic byte: 
dot number = OQ: graphic byte 
dot number 1: graphic byte 
dot number 2: graphic byte bit 5 & bit 1. 
dot number 3: graphic byte bit 4 & bit 0. 
M3 contains information to be entered in the 
addressed graphic byte bit pair: 
bit 1,M3 going to high nibble dot. 
bit 0,M3 going to low nibble dot. 
Information entered. MO, M1, M2, M3 not changed. 
P, A, X, Y and M4 changed. 
No check that dot number and dot information are 
both in the range 0 to 3. 


bit 7 & bit 3. 
bit 6 & bit 2. 


PAX Y 
1 
MO M1 M3 M4 


25 (plus 8 bytes for tables MASK and VAL). 
48 


-discreet *interruptable *promable 

*reentrant -relocatable -robust 

M0 :2-byte stored graphic byte address. 

M2 :Stored dot number. 

M3 :Stored dot information. 

M4 >For temporary storage of graphic byte. 

DOTNO :Use dot number as index to A4& M2 

MASK,Y :getting dot position mask. B9 lo hi 
:Saved for masking inf later. 48 

#S$FF :Invert to mask-out byte. 49 FF 

#0 :Zero index addressed byte. AO 00 

(ADDR),Y :Mask out old information and 31 M0 

TEMP :store for later merge in. 85 M4 
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LDX DOTINF >Use info number as index. Aé M3 
PLA >Recover mask-in byte and get 68 
AND VAL,X :value at dot bits only then 3D lo hi 
ORA TEMP :merge with unaffected dots 05 M4 
STA (ADDR),Y :and re-store graphic byte. 91 M0 
RTS >Exit, information placed. 60 
MASK BYTE 410001000 :Mask for dot 0. 88 
~BYTE 401000100 :Mask for dot 1. 44 
~BYTE 400100010 :Mask for dot 2. 22 
~BYTE 400010001 :Mask for dot 3. 11 
VAL -BYTE £00000000 :Info 00 for all dots. 00 
~BYTE 400001111 : " 01 " OF 
~BYTE 411110000 : " 10 " FO 
~-BYTE 411111111 : " 11 " FF 


THE ROTATIONAL METHOD 


DOT5 by W. Anderton is by far the slowest method at 164 clock 
cycles but is as short as DOT1 and very elegant in principle. The 
target byte is rotated nine times through the Carry flag and, as the 
bits relating to the addressed dot move into Carry, they are replaced 
by the new information bits. 


The target byte in DOT5 is addressed absolutely since rotate 
instructions cannot use the indexed indirect addressing modes. 
However, the routine can easily be altered to use indirect addressing. 
Indirectly pick up the target byte in A just before the first call to 
ROLL (using the address in MO,1 indexed by Y, as in the other DOT 
routines) and store the adjusted byte back after the last ROLL-CALL 
(irresistible!). The first instruction in ROLL should be changed from 
“ROR GBYTE’ to ‘ROR A’. 


= DpoT5 Put graphics dot information 

:J0B To store graphic information to 2 specified bits 

: of a graphic byte. 

:ACTION Rotate graphic byte lo-nibble bit to Carry. 

: Move new lo-nibble information bit to Carry. 
Rotate graphic byte hi-nibble bit to Carry (new 
information bit being rotated in). 

Move new hi-nibble information bit to Carry. 
Rotate new information bit to correct place in 
graphic byte. 

> CPU 6502 

sHARDWARE Graphics byte in (video) RAM. 

:SOFTWARE "ROLL" - Local subroutine to rotate the graphic 


byte right through Carry by X bits. 


DOT GRAPHICS 


> INPUT M2 contains the dot number relating to pairs of 
: bits in the graphic byte: 
dot number = 0: graphic byte bit 7 & bit 3. 
dot number 1: graphic byte bit 6 & bit 2. 
dot number 2: graphic byte bit 5 & bit 1. 
dot number 3: graphic byte bit 4 & bit 0. 
M3 contains information to be entered in the 
graphic byte bit pair: 
bit 1,M3 going to high nibble dot. 


: bit 0,M3 going to low nibble dot. 

:OUTPUT Information entered. M2 not changed. 

: P, A, X and M3 changed. 

>ERRORS No check that dot number and dot information are 


both in the range 0 to 3. 
: Arithmetic error if D = 1 (decimal mode). 
>REG USE P A X 
:STACK USE 2 
>RAM USE M2 M3 


> LENGTH 32 

= CYCLES 164 

:CLASS 2 -discreet *interruptable *promable 

po kkk-- xreentrant -relocatable -robust 

GBYTE = $hilo zAbsolute address of graphic byte. 

DOTNO = M2 :Stored dot number. 

DOTINF = M3 :Stored dot information. 

DOT5 LDA #4 :Calculate number of rotations A9 04 
SEC :needed to shift addressed 38 
SBC DOTNO slo-nibble bit out to Carry, E5 M2 
TAX :move to X as rotation count AA 
JSR ROLL zand get bit out to Carry. 20 lo hi 


LSR DOTINF :Replace C by new lo-nib bit, 46 M3 
LOX #4 :and go rotate it in, getting A2 04 
JSR ROLL :old hi-nib bit out to Carry. 20 lo hi 


LSR DOTINF :Replace C by new hi-nib bit 46 M3 


LDX DOTNO :and compute no. of rotates A6é M2 
INX zneeded to restore correct E8 

JSR ROLL :positions, go do it, then 20 lo hi 
RTS rexit, information placed. 60 


ROLL ROR GBYTE :Rotate graphic byte, through 6E lo hi 


DEX :Carry, for count of X. CA 
BNE ROLL -(Total of 9 rotations done in DO FA 
RTS :D0T5, leaving GBYTE correct.) 60 
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Random numbers are readily obtained in the real world. Rolling 
dice will provide you with a set of values 1 to 6 which are fairly 
random although any slight irregularity in a die will predispose it 
to favour one face more than another. Random binary values can 
be built up from tossing a single coin several times. Heads for a 1, 
tails for a 0. The way a coin lands depends on many variables 
outside your control. 


Since computers operate with very precise regularity, they cannot 
produce a true random sequence. However, it is easy to produce a 
repeating pseudo-random sequence which has many characteristics 
similar to those of a true random sequence. How soon the sequence 
repeats depends on the precision to which the values are calculated. 
A good pseudo-random number generator working to 16-bit 
precision should produce 65536 different numbers before repeating. 


Pseudo-random sequences are useful because every value occurs just 
once and apparently has no relation to the previously generated 
value. Programs can be tested with all posssible data, in a way which 
doesn’t rely on regularity, by going through an entire sequence of 
‘random’ numbers. 


The method of calculating pseudo-random numbers usually involves 
multiplying the previous number, or an initial value known as a seed, 
by a carefully chosen factor and then adding a constant — again 
chosen very carefully to eliminate undesirable effects. The result is 
then divided by another value and the remainder taken as the new 
random number. 


Other generating methods can be used. For example, a sequence of 
binary numbers which exhibits random spread can be formed by 
manipulating individual bits from the previous number. Values read 
from independently changing hardware devices are sometimes used 
where speed or program length is more important than the degree of 
randomness. 


Many pseudo-random number routines have appeared in Sub Set 
but, strangely, most of them have been for the Z-80. The routines in 
this chapter therefore consist of two routines contributed for the 
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6502, RND16B and RANDI, and three direct conversions from the 
Z-80 originals, RND16, RND32 and RND31. The discussion on 
random method owes nothing to the actual language in which the 
routines discussed were written. 


RANDOM BITS VIA THE 6522 


RND16B by T. A. Browning makes use of the two 16-bit counters of 
the 6522 Versatile Interface Adapter (VIA) to compute 16 random 
bits. 


I cannot say just how random this method is. I suspect that if 
RND16B is used from inside a loop which operates in a regular, fixed 
number of clock cycles, then it might not be at all random. The loop 
iteration time could well coincide with the counter decrement time. 


For one-off random number requests, RND16B does provide a very 
fast method of obtaining a ‘non-calculated’ value. 


:J0B To compute 16 random bits using a hardware timer 
: device with two independent 16-bit counters. 
:ACTION Set counters to decrement continuously without 


interrupt. 
Read contents of timer 2 into registers. 
Exclusive-or hi-byte with lo-byte of timer 1. 
Exclusive-or lo-byte with hi-byte of timer 1. 
Write contents of registers to timer 2. 

: CPU 6502 

>HARDWARE 6522 Versatile Interface Adapter (VIA). 

:SOFTWARE None. 

: INPUT None. 

:OUTPUT A,Y contains an unknown (random) number. 

: Sign and Zero flags show status of byte in A. 
VIA counters are free-running. 

: VIA port input latches are disabled. 

:ERRORS The randomness of the result depends entirely on 

: the regularity with which RND16B is called. 

:REG USE A YP 

>STACK USE None. 

:RAM USE None. 


> LENGTH 25 

:CYCLES 38 

:CLASS 2 -discreet *interruptable *promable 

pr kkkK- xreentrant *relocatable -robust 

RND16B LDA #0 :Clear VIA ACR putting timers A9 00 
STA ACR :in one-shot mode. 8D lo hi 
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LDA T2C-L :Exclusive-or lo-bytes with AD lo hi 
EOR TiC-L shi-bytes of other counter 4D lo hi 
TAY swith Y as low byte result AS 

LDA T2C-H :and A as high byte result. AD lo hi 
EOR TIC-L : 4D lo hi 


STY T2C-L :Write result into counter of 8C lo hi 
STA T2C-H :timer 2, letting it decrement. 8D lo hi 
RTS :Exit, random bits in AY. 60 


DOUBLY RANDOM 


RANDI is a routine of mine which I originally wrote for the 6502 
processor, converted to Z-80 and submitted to Sub Set as a ‘double 
datasheet’. 


The 6502 and Z-80 differ greatly in the way that BCD adjustment of 
arithmetic results takes place. Add and subtract operations on the 
Z-80 need a following ‘DAA’ instruction to adjust the value in the 
accumulator to BCD. Consequently, programming for 2-mode 
operation is neither easy nor quick since the routine has to test which 
mode 1s in force (I used the Carry flag to indicate this) and branch to 
the appropriate section. 


The method of BCD adjustment on the 6502 — by having one bit in 
the status register determining whether binary or BCD arithmetic is 
currently being performed —- is perfect for 2-mode routines. 
Although it normally carries the overheads of having to ‘CLD’ to 
ensure binary (or ‘SED’ to ensure decimal) operations are performed 
correctly, it does allow routines to be written which operate in either 
mode depending on the input state of this flag. 


:J0B To compute a pseudo-random integer as a value in 
: the series: R(it1) = (RCi) * a + c) mod mo. 
For binary: a = 257, c = 41 and m = 2%*32, 
: For BCD: a = 101, c = 29 and m = 10**8. 
>ACTION Temp = constant. 
: FOR variable low order byte to high order byte: 
C Accumulator = temp. 
Temp = byte. 
Byte = byte + accumulator. ] 
:CPU 6502 
>HARDWARE None. 
>SOFTWARE None. 
> INPUT M0,1,2,3 contains seed or previous random value. 
: (Low order byte in MO, high order in M3.) 
Decimal mode flag (D) set or reset accordingly. 
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>QUTPUT 


>ERRORS 


>REG USE 
>STACK USE 
>RAM USE 
> LENGTH 
CYCLES 


:CLASS 2 
a kkKKKA 


7 
< 
=> 
= 
nog 


RANDI PHP 
PHA 
CLC 
LDY 


LDX 


M0,1,2,3 contains new random number. 

X = 0. Y = high order byte of previous number. — 
P is unchanged. 

Arithmetic error if the random variable does not 
contain valid BCD digits on entry with D = 1. 
PX Y 

2 

MO M1 M2 M3 

20 


-discreet *interruptable *promable 
xreentrant *relocatable -robust 


MO :-4-byte stored random number variable. 

$29 :Constant to be added to RVAR * 257. 
:if D = 0 (binary mode) CNST has the 
:value 41 (decimal), if D = 1 (decimal 
zmode) it has the value 29 (decimal). 


:Save flags 08 
-and A for use in RANDI. 48 
-Ensure no carry to first add. 18 


#CNST :Constant as lo-byte of shifted AO 29 
:variable added to variable. 
#-4 :Index from low order byte. A2 FC 


sGet RVAR last byte, shifted. 98 


LDY RVAR+4,X :Pick up next RVAR byte to shift. B4 M4 
ADC RVAR+4,X :Add in shifted byte to current 75 M4 
STA RVAR+4,X :byte, i.e. part RVAR * 257 95 M4 
INX :Repeat for all four bytes of E8 
BNE RNDLP zrandom variable. DO F6 
PLA -Restore A and P, leaving 68 | 
PLP -X = 0 and Y = previous hi-byte. 28 
RTS -Exit, new RVAR formed. 60 
RANDOM THEORY 


The subject of random numbers provoked a lengthy discussion in 
Sub Set about the usefulness of certain constants. 


The first formula based random number generator to appear in Sub 
Set was published in July 1981. It was based on the formula: 


with 


9 OS 
iow on 
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Xi+1=aXi+CmodM 


216 (j.e. 65536) 
any odd number 
1 mod 4 (1,5,9, 13, etc.) 
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An ‘a’ of 257 was used in the routine, since it is an easy multiplier, 
but 4 other possible values of ‘a’ were given: 765, 889, 989 and 2009. 


In October 1981, Brian Steel, referring to Knuth, The Art of 
Computer Programming, Volume 2, Chapter 3, wrote of two 
important points concerning the randomness of the method and 
values, 


Firstly, n a mixed congruential sequence: 
Xi+1 = aXi+ CmodM 


1t 1s important that ‘potency’ 1s high. Potency ts the power to which 
(a—1) must be raised before it 1s directly divisible by M (which it 
wil always be eventually, if your other criteria are met). 257 1s a 
bad ‘a’, since a—1 = 256 and 256° = 65536 or 2"°: thus potency is 
only 2. If you choose ‘a’ as 5 mod 8 you will ensure higher potency 
(e.g. 5, 13, 21, etc.). 


Secondly, ‘a’ should be greater than 0.01M but less than 0.99M for 
good results. 257 is less than 0.01 * 2'° and therefore fails here too. 


By observing these guidelines you have a better chance of getting a 
good PRANG [pseudo-random number generator], but sadly it 
can’t be guaranteed on these (plus your) criteria alone. However, no 
good PRANG has a potency of less than 3, and 5 or more 1s desirable 
for this measurement. 


In February 1982, Sub Set published another Z-80 datasheet which 
used an ‘a’ of 1021 and thus satisfied Brian Steel’s potency 
requirement. 


The contributor, Andrew Bain, also gave a simple method of 
calculating ‘potency’ when M = 65536. Continue to divide (a—1) by 2 
until you obtain an odd number result. Divide the number of 
divisions made into 16 and the integer result is the potency. 


In June 1982, Dr. Brian Ripley joined in the randomness argument, 


You ask 1f there are better multipliers ‘a’. There are! The best 
reference 1s indeed Knuth, but the second (1981) edition 1s needed. 
He stresses the importance of the ‘spectral test’ which identifies the 
problems with all generators known to be bad for other reasons. 
None of a = 257, 765, 889, 989, 1021 or 2009 passes acceptably. 
High potency 1s a necessary but not sufficient condition for a good 
generator. The choice of a = 257 is dreadful, and a = 765 and 
1021 are bad. A search through multipliers from 1 to 2001 with a 
= | mod 4 (so the period is M = 2'°) suggests the following, in 
roughly decreasing order of merit; 293, 389, 1509, 249, 1785, 685. 

115 


RANDOM NUMBERS 


Little seems to be Rnown about the choice of C except that 1t should 
be odd. All the more powerful tests look at the whole sequence, 
which doesn’t depend on C’. The choice C = 41 seems as good as 
any. 


A more serious point is whether 16 bits are good enough. It seems 
that for serious work they are not. The spectral test looks at the 
lattice of successive k-tuples (X1/M, X1+1/M, ..., X1+k/M). In 
four dimensions the points of the lattice are at least '/16 apart, which 
is very different from the uniform scatter they should have. It would 
be worth the effort of producing a 32-bit generator. Try a = 69069, 
71365 or 100485. 


Unfortunately this 1s a rather technical subject area in which many 
mistakes have been made. (See, e.g., P. ¥. Brown’s comments in 
‘Writing Interactive Compilers and Interpreters’.) All I can say 1s 
that 1t 1s one of my research interests and the algorithm used is an 
improvement on Knuth’s. 


In the same issue doubts were cast by Ettrick Thomson on the 
randomness of the sequences produced by the algorithm when using 
a modulus which is a power of 2. 


For 16-bit numbers and M = 2'°©(M = 65536), bit 15 has a period of 
216, bit 14 a period of 2!°, and so on down to bit 0 which has a period 
of 2!. The least significant byte repeats every 256 numbers and all 
numbers in the sequence are alternatively odd and even. 


16-BIT RANDOM 


RND 16 (a 6502 conversion of a Z-80 Sub Set datasheet) uses an ‘a’ of 
1509, one of the numbers thought good by Dr. Brian Ripley. 


The normal method used for 16-bit multiplication is quite slow but 
when the multiplier is always the same, it can be factorised and the 
multiplication carried out by binary shifting and addition which is 
relatively quick. 1509 factorises nicely to various nested powers of 2 
and the routine executes in less than half the time it would take using 
16-bit long multiplication. 


:J0B To generate a 16-bit pseudo-random number using 
: the series, 
: RCi+1) = (RCI) * 1509 + 41) mod 2**16, 
sACTION The multiplier, 1509, is factorised to powers of 
: 2 to allow multiplication by shifting and binary 
addition. 1509R is factorised, 

((32R + (16R - R)) * 8 +R) * 4 +R, 
Taking only the low 2 bytes ensures mod 2**16, 
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CPU 
>HARDWA 
-SOFTWA 


:REG US 
>STACK 
>RAM US 
:LENGTH 
: CYCLES 


: CLASS 
ckeKKKK 


6502 
None 
None 
MO,1 contains the previous random value or seed. 
M0,1 contains the new pseudo-random number. 
None. 
None 
5 
MO M1 
131 
259 or 263 
kdiscreet *interruptable *promable 
xreentrant *relocatable *robust 
MO s2-byte stored random number variable. 
M2 :2-bytes for storing last random number. 
41 :Constant to add after multiplication. 
:Save flags and 08 
:accumulator for use in RND16. 48 
COPY+1 :Save two bytes of page zero A5 
:to hold previous 48 
COPY :random number for A5 
sadding/subtracting. 48 
:Ensure binary arithmetic. D8 
RNDV :Save previous A5 
COPY zrandom number (R) 85 
RNDV+1 sin COPY for adding/subtracting A5 
COPY+1 :in multiplication by shifts. 85 
RNDV :Shift left by 4 bits 06 
RNDV+1 :to multiply by 16 26 
RNDV :giving, 06 
RNDV+1 :RNDV = R * 16, 26 
RNDV : 06 
RNDV+1 26 
RNDV 06 
RNDV+1 26 
RNDV+1 :Save 16R on stack (hi-byte) A5 
zand in A (lo-byte). 48 
RNDV : A5 
RNDV :One bit left shift, 06 
RNDV+1 :RNDV = R * 32, 26 
. :Prepare to add. 18 
RNDV sAdd R * 16 back 65 
RNDV sin to R * 32 85 
:(getting hi-byte off stack) 68 
RNDV+1 :giving, 65 
RNDV+1 :RNDV = R * 48. 85 


RE 
RE 


E 
USE 
E 


1 


PHP 
PHA 
LDA 
PHA 
LDA 
PHA 
CLD 


LDA 
STA 
LDA 
STA 


ASL 
ROL 
ASL 
ROL 
ASL 
ROL 
ASL 
ROL 


LDA 
PHA 
LDA 


ASL 
ROL 


CLC 
ADC 
STA 
PLA 
ADC 
STA 
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M3 


M2 


M0 
M0 
M1 
M0 
M0 


M1 
M1 


117 


RANDOM NUMBERS 


SEC :Prepare to subtract. 38 
LDA RNDV :Subtract saved R from A5 M0 
SBC COPY :current RNDV E5 M2 
STA RNDV giving, 85 M0 
LDA RNDV+1 <:RNDV = R * 47. A5 M1 
SBC COPY+1 : E5 M3 
STA RNDV+1 85 M41 
ASL RNDV :Shift left by 3 bits 06 MO 
ROL RNDV+1 :to multiply by 8 26 M1 
ASL RNDV giving, 06 M0 
ROL RNDV+1 :RNDV = 47R * 8 or, 26 M1 
ASL RNDV :RNDV = R * 376. 06 MO 
ROL RNDV+1: 26 Mi 
CLC :Prepare to add. 18 
LDA RNDV :Add saved R to A5 M0 
ADC COPY scurrent RNDV 65 M2 
STA RNDV igiving, — 85 M0 
LDA RNDV+1 :RNDV = R * 377. A5 M1 
ADC COPY+1 : 65 M3 
STA RNDV+1 85 M1 
ASL RNDV :Shift left by 2 bits 06 M0 
ROL RNDV+1 sto multiply by 4 giving, 26 M1 
ASL RNDV SRNDV = 377R * 4 or, 06 M0 
ROL RNDV+1 :RNDV = R * 1508. 26 M1 
CLC :Prepare to add. 18 
LDA RNDV :Add saved R to AS MO 
ADC COPY :current RNDV 65 M2 
STA RNDV :giving, 85 M0 
LDA RNDV+1 :RNDV = R * 1509. A5 M1 
ADC COPY+1 : 65 M3 
STA RNDV+1 85 M1 
CLC :Prepare to add. 18 
LDA #CNST :Add constant to A9 29 
ADC RNDV scurrent RNDV 65 M0 
STA RNDV :giving, 85 M0 
BNE ENDR16 :RNDV = R * 1509 + 41 DO 02 
INC RNDV+1 :mod 2**16 because 2-bytes only. E6 M1 
ENDR16 PLA :Restore saved 68 
STA COPY spage zero. 85 M2 
PLA :two bytes. 68 
STA COPY+1 : 85 M3 
PLA :Restore accumulator 68 
PLP zand flags, then exit with 28 
RTS znew random number in RNDV. 60 
32-BIT RANDOM 


RND32 (again a 6502 conversion from the Z-80) uses another of Dr. 
Brian Ripley’s good multipliers, 69069. This too factorises quite well 
and could have been dealt with as a sequence of shifts and additions 
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in the same way as RND16. However, as well as having a regular 
decimal pattern it has a repeating bit pattern making the use of a 
loop possible. 


69069 in hexadecimal is $10DCD and this splits to $10101, which 
can be effected by byte shifting, and $CCC which is achieved in 
RND32 by an iterative doubling process — this adds in an extra 1 in 
two consecutive iterations out of every four. 


The loop count used is negative and counts upwards to zero because 
this produces a reset bit 1 in the count in the first 2 of every 4 
iterations. A positive count-down would have reset bits in the first 
and last of every four iterations. 


= RND32 32-bit pseudo-random number generator. 
:J0B To generate a 32-bit pseudo-random number using 
the series, 
RCi+1) = (RCI) * 69069 + 41) mod 2%*32, 
ACTION Uses the factorisation of 69069 = 
3 (210 + 246 + 2ke2) + 24416 + 2B + oxxQ, 
to multiply by shift and addition. 
Restricting the partial and final result to 
4 bytes ensures "mod 2**32". 
2CPU 6502 
>HARDWARE None 
7 SOFTWARE None 
INPUT MO to M3 contains previous random value or seed. 
: (Low order byte in MOQ, high order in M3.) 
:OUTPUT MO to M3 contains the new pseudo-random number. 
>ERRORS None. 
:REG USE None. 
=STACK USE 7 
:RAM USE MO M1 M2 M3 
> LENGTH 138 
CYCLES 808 
:CLASS 1 *discreet *interruptable *promable 
TREKKKE xreentrant *relocatable *robust 
RNUM = MO :>4-byte stored random variable. 
CORN = M4 :4-byte store for random number copy. 
CNSTNT = 41 :Constant to be added in. 
RND32 PHP :Save flags, | 08 
PHA :accumulator 68 
TXA :and index X 8A 
PHA >for use in RND32. | 68 
CLD :Ensure binary arithmetic. d8 
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RPSHLP 


SHFTLP 


SLINX 
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LDX 
LDA 
PHA 
LDA 
STA 
INX 
CPX 
BNE 


LDX 
ASL 
ROL 
ROL 
ROL 
TXA 
AND 
BNE 


CLC 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 


INX 
BNE 


CLC 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 


CLC 
LDA 
ADC 
STA 
LDA 
ADC 
STA 


#0 
CORN ,X 


RNUM,X 
CORN,X 


#4 
RPSHLP 


#-11 
RNUM 
RNUM+ 1 
RNUM+2 
RNUM+3 


#2 
SLINX 


RNUM 

CORN 

RNUM 

RNUM+ 1 
CORN+1 
RNUM+ 1 
RNUM+2 
CORNt2 
RNUM+2 
RNUM+3 
CORN+3 
RNUM+3 


SHFTLP 


RNUM 

CORN 

RNUM 

RNUM+ 1 
CORN+1 
RNUM+1 
RNUM+2 
CORN+2 
RNUM+2 
RNUM+3 
CORN+3 
RNUM+3 


RNUM+2 
CORN 

RNUM+2 
RNUM+3 
CORN+1 
RNUM+3 


RANDOM NUMBERS 


-Index from low order bytes. 
:Save 4 bytes of page zero 
:on stack and a 

>copy previous random number 
2(R) to it for adding in. 
:Repeat for low order 

:to high order bytes 

:of RNUM and CORN. 


“Set loop counter, 11 passes. 
:RNUM = RNUM * 2, 

sLoop will produce: RNUM = 
=3R & (2%*10 + 246 + 2%*2), 


:If bit 1 of loop count 
:is set (1) then 
:skip to loop end test. 


:Else, prepare to add, 
*RNUM = RNUM + R. 


:(This occurs on passes 
71, 4, 5, 8 and 9. 
:Were X to count down 
:from +11 then it 
:would occur on passes 
3, 4, 7, 8 and 11 
:giving an incorrect 
:factor of 2866 
sinstead of the 
scorrect 3276.) 


:Repeat 11 times, ending 
swith RNUM = 3276R. 


:Prepare to add. 
:Add R giving, 
>RNUM = RNUM + R & 2%*Q, 


-This and the next two 
sadditions will produce 
-RNUM = RNUM + 

2R & (2eeQ + 2%*16 + 24*8) + 
sconstant 41. 


:Prepare to add. 

sAdd low 2 bytes of R 
:into high 2 bytes of RNUM 
:giving, 

<RNUM = RNUM + R *® 2%*16., 


00 
M4 


M0 
M4 


RANDOM NUMBERS 


CLC :Prepare to add. 18 
LDA RNUM :Add constant to lo-byte RNUM, A5 MO 
ADC #CNSTNT :byte 0,R to byte 1,RNUM, 69 29 
STA RNUM sbyte 1,R to byte 2,RNUM 85 M0 
LDA RNUM+1 sand | A5 M1 
ADC CORN sbyte 2,R to byte 3,RNUM 65 M4 
STA RNUM+1 igiving, 85 M4 
LDA RNUM+2 >RNUM = RNUM + R * 2%*8 + 41, A5 M2 
ADC CORN+1 : 65 M5 
STA RNUM#2 : 85 M2 
LDA RNUM+3 : A5 M3 
ADC CORN#t2 : 65 M6 
STA RNUM+3 : 85 M3 
LDX #4 :Index from high order byte. A2 04 
RPULLP PLA -Restore saved page zero 68 
STA CORN-1,X :bytes used for R copy. 95 M3 
DEX :Repeat for high order CA 
BNE RPULLP :byte to low order. DO FA 
PLA :Restore 68 
TAX sregisters AA 
PLA zand 68 
PLP :flags used in RND32 then exit 28 
RTS swith new random number in RNUM. 60 
31-BIT RANDOM 


The last of these conversions from the Z-80, RND31, solves the 
periodicity problem Ettrick Thomson found to occur when using a 
modulus that is a power of 2. 


The algorithm of RND31, which rotates the top 9 bits with the 
bottom 22 bits, is slightly difficult to relate to the formula: 


RCi+1) = RCI) * (27 4+ 1) mod (231 - 1) 


but it does indeed work and is very fast. The sequence repeats after 
(23! — 2) numbers, giving all possible values except 0 and 
$7FFFFFFF. 


The primitive root (2? + 1) may be too small to produce good 
random numbers. A better choice would be (2!” + 1). You can use 
the algorithm of RND31 for this larger element, simply rotate 2 bytes 
+ ] bit from high order to low order instead of the 1 byte + 1 bit that 
is rotated in RND31. 


:J0B To generate a 31-bit pseudo-random number using 
: the series, 
RCit+t1) = (RCI) * a) mod a, 
m 2**31 - 1 (a Mersenne prime) 
a 2**9 + 1 (a primitive root of m). 
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:ACTION Move RN bits 21 - 0 into TEMP bits 30 - 9. 
: Move RN bits 30 - 22 into TEMP bits 8 - OQ. 
Clear TEMP bit 31. 
RN = RN + TEMP. 
IF bit 31,RN = 1 THEN: 
C Bit 31,RN = 0. RN = RN + 1. J 
sCPU 6502 
sHARDWARE None. 
>SOFTWARE None. 


> INPUT RN contains previous random number or seed. 
:OUTPUT RN contains new random number. 

: P, A, X and MO, M1, M2, M3 are changed. 
sERRORS Arithmetic error if D = 1 (decimal mode). 


A re-entering routine could overwrite a 
partially completed new random number resulting 
in an out-of-sequence number being generated by 

: the re-entered routine. 

REG USE P A X | 

:STACK USE None. 

RAM USE MO M1 M2 M3 

: "RN" - 4 bytes containing 31-bit pseudo-random 


: number, not necessarily appended to RND31. 

: LENGTH 71 (not including 4-byte RN). 

CYCLES Minimum: 143. Maximum: 207. 

:CLASS 2 -discreet *interruptable -promable 

fakecs- -reentrant -relocatable -robust 

TEMP = MO 74-byte temporary store for rotated RN 

zbefore adding back to RN. 

RN .BYTE dd :31-bit seed for pseudo-random number. 
-BYTE cc :Stored anywhere in memory. 
~BYTE bb :High order byte at 
-BYTE aa :RN+3, must be less than 128 ($80). 


c...1St step - get RN bits 21 - 0 into TEMP bits 30 - 9, 
:...clearing TEMP bits 31 and 8. 


RND31 LDA’ RN :Move RN bits 7 to 0 into AD lo hi 
ASL A :Carry and TEMP bits 15 to 9, OA 
STA TEMP+4 :clearing TEMP bit 8 for 85 M1 
LDA RN+1 slater receiving RN bit 30. AD lo hi 
ROL A :Move RN 15 to 8 and C 2A 
STA TEMP+2 zinto C and TEMP 23 to 16. 85 M2 
LDA RN+3 :Move RN bits 21 to 16 and C AD lo hi 
ROL A :into TEMP bits 30 to 24, 2A 
AND #$7F :clearing TEMP 31. 29 7F 
STA TEMP+3 :22 bits moved to TEMP. 85 M3 
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>..eNext step - get RN bits 30.- 22 into TEMP bits 8 - 0, 
t22-giving completed rotation in TEMP. 


LDA RNt+4 :Move RN bits 31 to 24 AD lo hi 

STA TEMP :into TEMP bits 7 to 0. 85 M0 

LDA RN+3 :Get RN byte containing bits AD lo hi 

ROL A 723 and 22. Shift left twice 2A 

ROL TEMP :discarding unused hi-bit 26 M0 

ROL A :so TEMP bits 7 to 0 get 2A 

ROL TEMP :RN bits 29 to 22 and 26 MO 

BCC RNADD >RN bit 30 goes to TEMP 90 02 

INC TEMP+1 sbit 8. E6 M1 
:...Add rotated value to original, exit if < (2%*31 - 1). 
RNADD CLC :Prepare for addition. 18 

LDX #-4 :Index from low order bytes. A2 FC 
RNADLP LDA RN-252,X :Do 32-bit addition of BD lo hi 

ADC TEMP+4,X :rotated number in TEMP 75 M4 

STA RN-252,X :added to original in RN 9D lo hi 

INX swith result to RN. E8 

BNE RNADLP : DO F5 

TAX :Test high order byte for AA 

BPL RNEXIT :bit 31 set, exit if not. 10 OF 


:..-Correct for RN > (2%*31 -1). RN = (2%*31 - 1) is not 
:...possible so correction will not set bit 31. 


AND #$7F :Clear bit 31 of new random 29 7F 

STA RN+3 znumber in RN. 8D lo hi 

LDX #0 :Index from low order byte. A2 00 
RNINC INC RN,X sIncrement RN byte, FE lo hi 

BNE RNEXIT sexit if no carry, DO 03 

INX selse index next higher byte €&8 

BNE RNINC sand repeat until no carry. DO F8 
RNEXIT RTS zExit, RN = new random value. 60 
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Extracting 


Roots 


The four routines in this chapter will find only the integer square | 
or cube roots of 16-bit or 32-bit values. The root returned from 
them is that of the largest square or cube less than or equal to the 
input value. The remainder (input value minus largest square/ 
cube) is also returned. 





The methods used can be extended to calculate roots to any precision 
of binary fraction. To do this, continue the iterative process (making 
sure that your page zero variables are lengthened to take the longer 
results and prevent overflow) for as many root bits as you need. The 
low order bits in excess of the numbers quoted in each routine will 
be the binary fraction. 


SQUARE ROOTS 


SQR15 and SQR16 are different entry points to a routine by John 
Kerr which calculates the 8-bit integer square root and 9-bit 
remainder (square — root’) of an input 16-bit square. When entered 
at $QR15, an initial test is made on the sign of the input value and if 
found to be negative then the routine aborts. $QR16 acts on a full 
16-bit unsigned, or absolute, value. 


SQR31 and SQR32 also by John Kerr do the same job as $QR15 and 
SQR16 but on 32-bit signed (positive) and unsigned values. The 
integer root of a 32-bit square is 16 bits and the remainder 17 bits. 


An analysis by John Kerr of the root extraction method is given later 
in this chapter. 


emaenrnewew ewe ewa wane wwe we ewww ew ewes we een ew ewe eee ew ewww eww wwe ewww wee ee =| = 
ese Ss BS ZO SO OBSTET BR SEOs eB ese SF SCS SF ZF ew ST SSF sss# S28 SSF FBT SB BZ MOORS SSeS SS |S |S wo we 


:= S$QR15 Square root of 16-bit signed (positive) value. 
>> SQR16 Square root of 16-bit unsigned (absolute) value. 


:J0B To extract the integer square root of a 16-bit 
: value, either unsigned or signed (negative 
values returned unprocessed). 
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ACTION IF positive value: 
: THEN: 
C Initialise part-root to 0. 
FOR root bits count, 8 TO 1: 
C Subtrahend = part-root * 2**(count * 2) 
+ 1 * 2ee((count - 1) * 2). 
IF subtrahend =< square: 
THEN: C Square = square - subtrahend. 
Part-root = part-root * 2+ 1. ] 
ELSE: C Part-root = part-root * 2. ] ] 
Set root-done flag. ] 
ELSE: C Set root-not-done flag. ] 
:CPU 6502 
: HARDWARE None 
>SOFTWARE None 
: INPUT Mi,2 contains the 16-bit square. 

: OUTPUT SQR15: N = 1: Negative input value - no root. 
: A = M2. P X Y M1 M2 M3 M4 unchanged. 
N = 0: As for SQR16. 

SQR16: N=0. C = 0. Y=0. A =X = M3, 

8-bit integer square root in M1,2 (M2 = Q). 

: 9-bit remainder in M3,4 (bits 7-1,M4 = 0). 
sERRORS Arithmetic error if D = 1 (decimal mode). 
:REG USE PAX Y . 
:STACK USE None. 
:RAM USE M1 M2 M3 M4 
:LENGTH 64 
CYCLES Maximum 615. 
CLASS 2 -discreet *interruptable *promable 
S7kkkk= *reentrant *relocatable -robust 


:..-Assign I/0 page zero variables. 
SQR = M1 


REM = 


:2-byte stored square (input) or 2-byte 


:root with hi-byte = 0 (output). 
M3 


:2-byte store for output 9-bit remainder. 


2+eeASSign working pseudo-registers in page zero to coincide 
z.eeWith 1/0 and thus reduce initial & terminal data moves. 


PARTRT = M1 >For working high byte of subtrahend, the 
:same as partial root at each stage. 
RSQLO = M3 >For lowest byte of 3-byte accumulator 
:containing square for 2-bit shifting. 
COUNT = M4 :For root bits count. 
SQR15 LDA SQR+1 :Test sign of input square, exit A5 M2 
BMI EXIT simmediately if negative. 30 3B 
SQR16 TAY :Transfer input to low two bytes A8 
LDA SQR sof reducing square accumulator A5 M1 
STA RSQLO :in X, Y, M3, initialising high 85 M3 
LDX #0 zbyte X to 0. A2 00 
STX PARTRT sinitialise part-root to 0. 86 M1 
LDA #8 :Set up count of 8 in count A9 08 
STA COUNT :register for 8-bit root. 85 M4 
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PAIRLP CPX 
BCC 
BNE 
CPY 
BCC 


PRTSUB TYA 
SBC 
TAY 
TXA 
SBC 
TAX 


NEXTPR ROL 


EXTRACTING ROOTS 


PARTRT :Test if subtrahend will subtract E4 
NEXTPR : from reducing square... first 90 
PRTSUB :test on hi-byte (part-root), then D0 
#$40 :on constant set bit at 2-bits C0 
NEXTPR :below current part-root. 90 
:Subtrahend is less than square 98 
#$40 :so subtract it from current E9 
spart of reducing square. Carry A8 
swas set on entry to subtraction 8A 
PARTRT :by test. Carry will be set on E5 
zexit by subtraction. AA 
PARTRT :Do part-root * 2 + result of sub. 26 
RSQLO :Shift left reducing square 06 
:in X, Y, M3 to bring next 98 
A spair of bits from square 2A 
:into high byte. After 8 A8 
ziterations the low two bytes 8A 
A zwill be zero and the remainder 2A 
:of the root extraction will be AA 
RSQLO :in Carry and the high byte X. 06 
: 98 
A :(Count in M4 will have 2A 
:reduced to zero, leaving it A8 
:ready to be remainder high byte 8A 
A zby rotating remainder high bit 2A 
sin from Carry flag.) AA 
COUNT :Repeat until 8-bit C6 
PAIRLP :root found. DO 
REM :Move 9-bit remainder to 2-byte 86 


REM+1 :output variable and clear hi-byte 26 
SQR+1 sof root, 8-bit result in lo-byte. 84 
:Exit, root done or N = 1. 60 


M1 


M1 
M3 


M3 


“= $QR31 


sACTION 


Square root of 32-bit signed (positive) value. 


Square root of 32-bit unsigned (absolute) value. 


To extract the integer square root of a 32-bit 


‘value, either unsigned or signed (negative 


values returned unprocessed). 
IF positive value: 
THEN: 
C Initialise part-root to Q. 
FOR root bits count, 16 TO 1: 
C Subtrahend = part-root * 2**(count * 2) 
+ 1 * 2ee((count - 1) * 2). 
IF subtrahend =< square: 
THEN: EC Square = square - subtrahend. 
Part-root = part-root * 2 +1. J] 
ELSE: C Part-root = part-root * 2. ] J 
Set root-done flag. ] 
ELSE: £€ Set root-not-done flag. ] 
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CPU 6502 
:HARDWARE None 
>SOFTWARE None 
> INPUT M1,2,3,4 contains the 16-bit square. 
OUTPUT SQR31: N = 1: Negative input value - no root. 
: A, X, Y and M1 to MA are unchanged. 
N = 0: As SQR32. 
SQR32: N=O0. C = 0. A=M1. Y=X=O0z 
M9 = M1. MA = M2, 
16-bit integer square root in M1,2,3,4. 
: 17-bit remainder in M5,6,7,8. 
>ERRORS Arithmetic error if D = 1 (decimal mode). 
=REG USE PAX Y 
:STACK USE None. 
RAM USE M1 to MA 
> LENGTH 75 
:CYCLES Maximum 4350. 
“CLASS 2 -discreet *interruptable *promable 
po kkKK- xreentrant *relocatable -robust 


:..-Assign I/0 page zero variables. 


SQRL = M1 >4-byte stored square (input) or 4-byte 
zroot with hi-word = 0 (output). 
REML = M5 :4-byte store for 17-bit remainder out. 


:...ASSign working pseudo-registers in page zero to coincide 
:...with 1/0 and thus reduce initial & terminal data moves. 


RSQ = M1 >For 6-byte accumulator containing 
‘square reduced by shifts and subtracts. 

MND = M4 >For working 3-byte minuend, highest 3 
zbytes of RSQ. 

SND = M8 :For working 3-byte subtrahend, highest 
:2-bytes contain the partial root at 
seach stage. Lowest is constant $40. 

$QR31 BIT SQRL+3 :Test sign of input square and 24 M4 

BMI EXITL sexit immediately if negative. 30 46 
SQR32 LDA #0 :Else prepare to clear working A9 00 
LDX #6 sregisters. A2 06 
CLEAR STA MND,X :Clear MA,9 for partial root, 95 M4 
DEX 7M? for output remainder, CA 
BNE CLEAR :M6,5 for hi-word of RSQ. DO FB 
LDA #$40 :Initialise constant set bit AY 40 
STA SND zin SND lowest byte. 85 M8 
LDY #16 :Set count for 16-bit root. AO 10 
PRLOOP LDX #3 :Index from highest bytes. A2 03 
SUBTST LDA MND-1,X :test if subtrahend will BS M3 
CMP SND-1,X :subtract from minuend, skipping 05 M7 
BCC RSLT :out appropriately, but 90 10 
BNE SUBPRT :looping for as many of the DO 03 
DEX 2:3 bytes as necessary. CA 
BNE SUBTST DO F5 
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SUBPRT LDX #-3 :Index from lowest bytes. Ad FD 


SUBLP LDA MND+3,X :Subtraction will go, so B5 M7? 
SBC SND+3,X :subtract part-root + constant F5 MB 
STA MND+3,X :from shifted part-square. 95 M7 
INX :C = 1 on entry by test, C = 1 E8 
BNE SUBLP >on exit by subtraction. DO F7 
RSLT ROL SND+1 :Do part-root * 2 + result of 26 M9 
ROL SND+t2 :subtraction. Leave C = 0. 26 MA 
TYA :Ensure bit 7,A = 0. 98 
NPAIR LDX #-6 :Index from low order byte. A2 FA 
NBIT ROL RSQ+6,X :Shift reducing square left 36 M7 
INX >by one bit into minuend. E8 
BNE NBIT :Repeat for all 6 bytes of RSQ. DO FB 
EOR #$80 :Flip sign bit in A, so shift 49 80 
BMI NPAIR :done twice for bit-pair. 30 F5 
DEY :Repeat until 16-bit root found, 88 
BNE PRLOOP :Leave C = remainder bit 16. DO D5 
LDA SND+2 :Transfer found root from A5 MA 
STA SQRL+1 :subtrahend hi-word to output 85 M2 
LDA SND+1 variable lo-word. Output hi-word A5 M9 
STA SQ@RL :cleared by RSQ shifting. 85 M1 


STY REML+3 :REML,REML+1 = MND+1,MND+2, clear 84 M8 
ROL REML+2 <:hi-byte, store remainder bit 16. 26 M7 
EXITL RTS >Exit, root done or N = 1. 60 


CUBE ROOTS 


CURT16 and CURT32 are 6502 versions of original Z-80 Sub Set 
routines. They use the cube root method developed by John Kerr to 
calculate the integer cube roots of 16-bit and 32-bit unsigned values 
respectively. You can easily adapt both for signed number work by 
adding an initial test on the sign of the input value (bit 7 of the high 
order byte) and aborting if the bit is set, as in $QR31. 


John Kerr’s cube root extraction method 1s similar to that used in the 
square root programs but with two major differences. 3 bits instead 
of 2 are processed in each iteration, except in the first iteration when 
only 1 bit (CURT16) or 2 bits (CURT32) are shifted in to the 
accumulator, neither 16 nor 32 being a multiple of 3. The partially 
formed cube root cannot be used directly as the subtrahend. 


Here is John’s account of how he adapted the square root method to 
work for cube roots. As well as explaining why the cube root 
subtrahend requires such an awkward adjustment, it offers a 
possible way of constructing algorithms for other mathematical 
functions. 
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In the algorithm for square root extraction, the input number 1s shifted 
left, 2 bits at a time, into a working accumulator. At some arbitary stage 
in the process, the ‘virtual input 1s that part of the input which has been 
shifted in and operated upon; call it ‘x’. The results of processing, so far, 
are a partial result ‘s’ for the square root, and a partial remainder ‘r’ in the 
working accumulator: | 


virtual input = x 

partial result = s 

partial remainder = r 
these quantities being related by the equation: 

s+t+r=x 
After the next round of processing, the virtual input has been augmented 
by the 2-bit number “y’ (the next 2 bits of the real input). The next least 
significant bit of the result has been found; call 1t ‘d’, so the new partial 
result 1s 2s + d. the remainder has also changed: 

virtual input = 4x + y 

partial result = 2s + d 

partial remainder = q 
So the previous equation now becomes: 


(Qs+d’+q=4xt+y 


Combine the two equations to eliminate ‘x’, remembering that as ‘d’ 1s a 
single bit (zero or one), d’ reduces to d. The result is: 


q = 4r+y 


when d=0, indicating that nothing has been subtracted from the working 
accumulator; or else, | 


q = 4r + y— (4s+1) 
when d=1, 1.e. the result bit ts set after a subtraction. 


This shows that the required subtrahend is 4s+1; that is, the previous 
partial result shifted 2 places left, then incremented. 


The same analysis can be applied to a hypothetical algorithm for 
extracting cube roots. It ytelds the subtrahend value which should be used, 
if the method 1s to be sound. Starting again at some arbitary point during 
execution, we have: 
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virtual input = x 
partial result = c 
partial remainder = r 


this time related by the equation: 
e+r=x 


Now the next 3 bits of the input number are shifted into the working 
accumulator, giving a new virtual input of 8x+y. After processing, a new 
result bit ‘d’ and a new partial remainder have been calculated: 


virtual input = 8x + y 
partial result = 2c + d 
partial remainder = q 


and the relevant equation 1s: 
(2c+d)? + q = 8x +y 


As before, the two equations are combined to eliminate the unknown ‘x’, 
and the result examined assuming d=0, then d=1. The first case 
(subtraction failed) gives: 


q=8rt+y 
The second gives: 
q = 8r + y—(12c? + 6c + 1) 


This does not augur well for the direct method of cubic root extraction; the 
required subtrahend 1s an ugly quadratic function of the ‘result-so-far’ , 
which looks like it will require recalculation, squaring and all, after each 
round of processing. Luckily, it doesn’t: because the subtrahend can be 
built up progessively, from an initial non-zero ‘c’ of 1 (so c? = 1), some 
multiple of ‘c”? always exists in the previous subtrahend. After a little 
more algebra, the cube root algorithm, devoid of full-length multiplica- 
tion, 1s obtained. 


The algorithm is shown in the documentation ACTION section of 
CURT16 and CURT32. 
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sACTION 


To extract the integer cube root and remainder 
of a 16-bit unsigned integer. 
Clear part-root, remainder and subtrahend. 
FOR root bits count 6 TO 1: 
C IF count = 6 
THEN: C Shift 1 cube bit into remainder. ] 
ELSE: C Shift 3 cube bits into remainder. ] 
IF remainder > subtrahend: 
THEN: 
C Remainder = remainder - (subtrahend + 1). 
Part-root = part-root * 2 + 1. 
Subtrahend = subtrahend * 4 
+ part-root * 18. J 
ELSE: 
C Part-root = part-root * 2. 
Subtrahend = subtrahend * 4 
~ part-root * 6. J] J] 


2 CPU 

> HARDWARE 
> SOFTWARE 
> INPUT 
OUTPUT 


ERRORS 
REG USE 
>STACK USE 
RAM USE 

> LENGTH 

: CYCLES 


:CLASS 1 
Skkkkke 


ro] 
So 
i=) 
_~ 
iT) 


o 
io) 
Cc 
= 
=f 
ont 


CURT16 PHP 
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PHA 
TXA 
PHA 
TYA 
PHA 
LDA 
PHA 
LDA 
PHA 
CLD 


M0,1 contains the 16-bit cube. 

M0,1 is unchanged. 

M4,5 contains the 6-bit integer cube root. 
M2,3 contains the 13-bit remainder. 


MO to M5 
157 
Minimum: 1403. Maximum: 1431. 


*discreet *interruptable *promable 
*reentrant *relocatable *robust 


M0 :2-byte stored cube, input and output. 

M2 :2-byte store for working minuend and 
soutput 13-bit remainder. 

M4 :2-byte store for working part-root and 
soutput 6-bit cube root (M5 = 0). 

M5 :i-byte for loop count. 

M6 s2-bytes for working subtrahend. 
Save all 08 
:flags 48 
:and 8A 
:registers 48 
sto be 98 
:used in routine. 48 

SND+1 :Save two bytes AS M7 
:of page zero 48 

SND :for use as A5 M6 
:subtrahend. 48 
:Ensure binary arithmetic. d8 


CLRLP 


CLOOP 


REMIN 


ADBACK 


NEWSND 


LDA 
LDX 
STA 
DEX 
BNE 


LDY 
STY 
LDX 


ASL 
ROL 
BCC 
INC 


ROL 
ROL 
DEX 
BNE 


ASL 
LDA 
SBC 
STA 
LDA 
SBC 
STA 
BCC 


INC 
LDA 
ASL 
ADC 
BCC 


SEC 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 


PHP 
ASL 
ROL 
LDY 
LDX 


STA 
ASL 
ROL 
ADC 
STA 
LDA 
AND 


#0 
#6 
REM-1,X 


_CLRLP 


#6 
COUNT 
#1 


CUBE 
CUBE+1 
REMIN 
CUBE 


REM 
REM+1 


CLOOP 


ROOT 
REM 
SND 
REM 
REM+1 
SND+1 
REM+1 
ADBACK 


ROOT 
ROOT 

A 

ROOT 
NEWSND 


REM 
SND 
REM 
REM+1 
SND+1 
REM+1 
ROOT 


SND 
SND+1 
SND 
SND+1 


SND 

A 
SND+1 
SND 
SND 
SND+1 
#1 


EXTRACTING ROOTS 


:Prepare to clear 6 bytes for 
:workspace and variables. 

:M7 & M6 for subtrahend, 

>M5 & M4 for root (M5 also count), 
7M3 & M2 for remainder. 


zInitialise count for 
76-bit root. 
:Do only 1 shift, 1st iteration. 


:Shift next cube bit 

:out into Carry flag and 
srotate back in to cube so it 
71s unchanged on exit. 


:Shift cube bit 

sup into remainder. 

>Do 1 bit first iteration, 
73 bits in other five. 


:New part-root if no subtraction. 
:Try to subtract subtrahend 

:from remainder (in each 
siteration, remainder is 

:last remainder * 8 + next set 
:of three bits from cube). 

:But if subtraction result is 
:negative then go add it back. 


sElse show result in part-root. 
:Get part-root * 3 in A, 
:(ready for later multiplying 
sby 6 to give part-root * 18). 
:Branch always to form new SND. 


:Add current subtrahend 

sback to remainder 

zand restore it to 

sthe value held 

:before subtraction. 

sAdd takes it past SFFFF so 
:Carry will be set at NEWSND. 
:Get part root for multiply by 6. 


:Save add/subtract flag (C). 
:Shift SND by one bit to 
smultiply by 2. . 
:Save it in XY for later 
:add/subtract of root * n. 


:SND = part-root * 1 (or 3). 

7A = p-r * 2 Cor 6) with overflow 
>bit saved in SND+1. Clears Carry. 
:SND = lo-byte result of 

zp-r * 3 (or 9). 

:Get "* 2 (or 6)" overflow bit 
swithout changing Carry and add 
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ADC #0 :carry from lo-byte result, 69 00 
STA SND+1 :M6,7 = part-root * 3 (or 9). 85 M7 
PLP sRestore add/subtract flag 28 
BCS SNDSUB :and do appropriate action. BO OC 
TYA :Add part-root * 9 98 
ADC SND :to subtrahend * 2 65 M6 
STA SND sputting result 85 M6 
TXA :back to 8A 
ADC SND+1 :subtrahend variable, then 65 M7 
STA SND+1 zbranch always to shift effecting 85 M7 
BCC SNDDBL :"SND = SND * 4 * ROOT * 18". 90 OA 
SNDSUB TYA :Subtract part-root * 3 98 
SBC SND :from subtrahend * 2 E5 M6 
STA SND sputting result back 85 Mé 
TXA :to subtrahend variable. 8A 
SBC SND+1 :Following shift will effect E5 M7 
STA SND+1 “"SND = SND * 4 - ROOT * 6". 85 M7 
SNDDBL ASL SND sShift to multiply by 2, and 06 M6 
ROL SND+1 :complete new subtrahend. 26 M7 
LDX #3 :Shift 3 bits into REM next time. A2 03 
DEC COUNT :Repeat for 6-bit cube root, C6 M5 
BNE CLOOP sleaving M5 = 0 as root hi-byte. DO 8C 
PLA :Restore two 68 
STA SND spage zero bytes 85 M6 
PLA zused for 68 
STA SND+1 ssubtrahend. 85 M7 
PLA Restore 68 
TAY call A8 
PLA :registers 68 
TAX sand AA 
PLA :flags, 68 
PLP sthen exit with integer cube root, 28 
RTS :remainder and intact cube. 60 


:J0B To extract the integer cube root and remainder 
: of a 32-bit unsigned integer. 
:ACTION Clear part-root, remainder and subtrahend. 


FOR root bits count 11 TO 1: 
C IF count = 11 
THEN: C Shift 2 cube bits into remainder. ] 
ELSE: C Shift 3 cube bits into remainder. ] 
IF remainder > subtrahend: 
THEN: 
C Remainder = remainder - (subtrahend + 1). 
Part-root = part-root * 2 + 1, 
Subtrahend = subtrahend * 4 
+ part-root * 18. ] 
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EXTRACTING ROOTS 


ELSE: 
C Part-root = part-root * 2. 
Subtrahend = subtrahend * 4 
- part-root * 6. ] J] 


= CPU 
sHARDWA 
:SOFTWA 


RE 
RE 


None. 
Local subroutine use, acting on page zero 
at X to X+3 (one argument), 
"ASL4", "ROL4", "LSR4&", "ROR4", 
and X+4 to X+7 (2nd argument), 
"ADD4", "ADC4", "SUB4", "SBC4", 


OUTPUT 


ERRORS 
REG US 
> STACK 
RAM US 
> LENGTH 
CYCLES 


E 
USE 
E 


M0,1,2,3 contains the 32-bit cube. 

M0,1,2,3 is unchanged. 

M8,9,A,B contains the 11-bit integer cube root. 
M4,5,6,? contains the 23-bit remainder. 

None. 

None 

11 

MO to MB 


187 (CURT32 = 139, Subroutines = 48). 
Minimum: 8599. Maximum: 9420. 


:CLASS 
kkkkok 


2 


kdiscreet *interruptable *promable 
kreentrant -relocatable *robust 


PSHWS 


CLRVLP 


PHP 
PHA 
TXA 
PHA 
TYA 
PHA 
CLD 


LOX 
LDA 
PHA 
DEX 
BNE 


LDA 
LDX 
STA 
DEX 
BNE 


LDA 
LDY 


MO >4-byte stored cube, input and output. 

M4 >4-byte store for working minuend and 
soutput 23-bit remainder. 

M8 >4-byte store 11-bit root output. 

M8 :4-bytes for working subtrahend. 

MC -4-bytes for working part-root. 
:Save all 08 
:flags 48 
:and 8A 
:registers 48 
:to be 98 
zused in routine. 48 
:Ensure binary arithmetic. D8 

#4 :Index from high byte. A2 04 

WPRT-1,X :Save 4 bytes of page B5 MB 
:zero workspace MF to MC 48 
:for use as working CA 

PSHWS ipart-root. DO FA 

#0 :Clear 12 bytes for AJ 00 

#12 sworkspace and variables. A2 OC 

REML-1,X :MF,E,D,C for part root, 95 M3 
7MB,A,9,8 for subtrahend, CA 

CLRVLP 7M7,6,5,4 for remainder. DO FB 

#11 :Count for 11-bit root. AJ OB 

#2 :Get 2 cube bits, first time. AO 02 
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CUBELP 
NXTBIT 


ROTREM 


ADDBCK 


LSTDBL 


PLLWS 
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PHA 
LDX 
JSR 
BCC 
INC 


LDX 
JSR 
DEY 
BNE 


LDX 
JSR 
LDX 
JSR 
LDA 
ROR 
BCC 


JSR 
JSR 
LDX 
JSR 
JSR 
JSR 
LDX 
JSR 
JSR 
BCC 


SEC 
LDX 
JSR 
JSR 
LOX 
JSR 
JSR 


LOX 
JSR 


LDY 
PLA 
SEC 
SBC 
BNE 


LDX 
LDA 
STA 
PLA 
STA 
INX 
CPX 
BNE 


#CUBEL 
ASL4 
ROTREM 
CUBEL 


#REML 
ROL4 


NXTBIT 


#REML 
SBC4 
#WPRT 
ROL4 
WPRT 

A 
ADDBCK 


ASL4 
ASL4 
#WSND 
ADD4 
LSR4 
LSR4 
#WSND 
ASL4 
ADD4 
LSTDBL 


#REML 
ADC4 
SUB4 
#WSND 
ASL4 
SUB4 


#WSND 
ASL4 


#3 
#1 
CUBELP 


#0 
WPRT,X 


ROOTL,X 


WPRT ,X 


#4 
PLLWS 


EXTRACTING ROOTS 


:Save count. 

:Shift next cube bit 

:out into Carry flag and 
:rotate back in to cube so it 
>1s unchanged on exit. 


:Shift cube bit 

:up into remainder. 

:Do 2 bits first iteration, 
>3 bits in other ten. 


:Index remainder and go 
:subtract subtrahend + 1. 
:Part-root = 
:+ subtraction result. 

:Get subtraction result out 
sinto Carry and go add 
sback if negative result. 


part-root * 2 


>Multiply WPRT by 4 

smultiply WPRT by 4, then 
:index subtrahend to get 

>Oold WSND + WPRT * 4, leaving 
:X indexing WPRT, so restore 
:to correct part-root. 

:Index subtrahend, then 

:old WSND * 2 + WPRT * 8, 
:old WSND * 2 + WPRT * 9, 
:then go complete new WSND. 


:Add subtrahend + 1 back 
>to remainder, restoring it 
>to value held before sub. 
7Old WSND - WPRT. 

:Restore index, then 

"Old WSND * 2 - WPRT * 2. 


:Old WSND * 2 - WPRT * 3. 
:Old WSND * 4 - WPRT * 6, or 
‘old WSND * 4 + WPRT * 18. 


:Get 3 cube bits next time. 
:Restore count and 
:decrement it. 

srepeat for all 11 bits 

-of cube root. 


:Index from low order bytes 
:Move working part root 

:to output root. 

:restore 4 page zero 

sbytes used for working 
spart root. 


03 
01 
Aé 
04 
MC 
M8 
MC 


04 
F4 


hi 


hi 


hi 


hi 


hi 
hi 
hi 
hi 
hi 


hi 
hi 


hi 
hi 


hi 
hi 


hi 


PLA 
TAY 
PLA 
TAX 
PLA 
PLP 
RTS 


:...Subroutines. 


ASL4 CLC 
ROL4 ROL 
ROL 
ROL 
ROL 
RTS 


LSR& =CLC 
ROR4 ROR 
ROR 
ROR 
ROR 
RTS 


ADD4 CLC 
ADC4 LDY 
AD4LP LDA 


RTS 


SUB4 SEC 
SBC4 LDY 
SB4LP LDA 


> 
1) | 


EXTRACTING ROOTS 


:Restore 

call | 

sregisters 

sand 

:flags, then exit 

:with integer cube root, 
sremainder and intact cube. 


>= ROL4 with clear carry. 
:Rotate 4 bytes 

:from low order 

:to high order, 

:leaving X unchanged and 
:Carry = carry out. 


>= ROR4 with clear carry. 
:Rotate 4 bytes 

ifrom high order 

:to low order 

:leaving X unchanged and 
:Carry = carry out. 


>= ADC4 with no carry in. 
:Count 4 bytes. 

:Add 4 bytes, from low 
:memory to high memory 
=(X) = (X) + (X+4), leave 
>X 
7A 
21 


input X + 4, Y = 0, 
result high order byte, 
1 and C = carry out. 


>= SBC4 with no borrow in. 
:Count 4 bytes. 

:Subtract 4 bytes, from low 
:memory to high memory 

2(X) = (X) - (X+4), leave 


input X + 4, Y=0, 
result high order byte, 
1 and C = borrow out. 
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Converting 14 
Decimal to Binary 


The usual method of converting from one base into another, say 
from base-A to base-B, is to initially clear a base-B accumulator 
and then repeat the following sequence until all base-A digits have 
been processed. 


] Multiply base-B accumulator by base-A using base-B arith- 
metic. 


2 Multiply base-A number by base-A getting next digit out as 
overflow. 


3 Add base-A overflow digit to base-B accumulator using base-B 
arithmetic. 


When converting from binary to any other base, operation (1) can be 
done easily by a left shift through the entire binary number. The 
highest binary digit (bit) overflows into the Carry flag. Operation (3) 
can be performed at the same time as operation (2) by adding in the 
Carry as the partial result is added to itself for the multiplication by 
2. 


Converting to binary from another base is not so easy. The binary 
partial result has to be multiplied by the base of the source number. 
Luckily, most conversions are from decimal (base 10) and there is a 
simple method of multiplying a binary number by 10 using shifts 
(binary multiplication by 2) and addition. 


] Shift number left by one bit (number * 2). 
2 Save number. 

3 Shift number left by one bit (number * 4). 
4 Shift number left by one bit (number * 8). 
5 


Add in saved number (number * 10). 


32-BIT SIGNED NUMBERS 


SADB4 and SBAD4 are the conversion section of a 32-bit signed 
number suite that also includes routines to negate, multiply, divide 
and get the absolute value. The arithmetic routines, including 
SNEG4 which is called by both SADB4 and SBAD4, appear later in the 
book. 
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CONVERTING DECIMAL TO BINARY 


The 32-bit suite made its first appearance in Sub Set as a set of 5 Z-80 
routines (modesty prevents me from crediting its authorship). It was 
soon followed by a marvellous translation into 6502 code by Dennis 
May. I must say that Dennis’s code used the available registers and 
page zero pseudo-registers to far better effect than the use made of 
the Z-80’s many registers by the original routines. Then, as though 
to demonstrate the maxim that there’s always a better way if you 
bother to look for it, Vincent Fojut went to work on the 6502 suite 
and produced faster, shorter code. 


Another maxim states that you can have the fastest code or the 
shortest but not both. You can have SADB4 at 124 bytes and a 
maximum of 1,650 clock cycles (SADB4Q) or at 99 bytes but much 
slower with a maximum execution time of 2,878 clock cycles 
(SADB4S). There is only one version of SBAD4. 


Both forms of SADB4 use the multiplicative method of conversion 
described above. This is a reasonably quick conversion method but 
testing for errors, such as overflow of the result accumulator, does 
tend to add a lot of timing overheads to any routine. If the absolute 
value of the input number string is validated as less than 23! then 
both SADB4Q@ and SADB4S can be written to operate much quicker 
by leaving out the overflow checks. 


:J0B To convert an integer decimal value held as a 
: string of ASCII digits in memory and preceded by 
a "+" or "=" sign (or assumed positive if no 
sign) to a two's complement signed 32-bit number 
(+/- 2%*31 - 1) held in registers. 
: Priority: speed. 
sACTION ON overflow: £€ Set overflow flag. Exit. J 
: Clear 32-bit result accumulator. 
Read and save first string byte. 
IF byte = "+" or "=" sign THEN: 
C Read next string byte. ] 
IF byte is ASCII digit THEN: 
{C Convert digit to binary. 
REPEAT: 
C Result = result * 10 + digit. 
Read next string byte. 
IF byte is ASCII digit THEN: 
C Convert digit to binary. ] ] 
UNTIL non-digit terminator. J] 
IF first byte = "-" THEN: C Negate result. J 
:CPU 6502 
sHARDWARE Memory containing ASCII decimal string. 
sSOFTWARE "SNEG4" - Routine to negate a 32-bit two's 
: complement signed integer held in M0 to M3. 


> INPUT 


:OUTPUT 


sERRORS 
sREG US 
>STACK 
sRAM US 
: LENGTH 
: CYCLES 


cm kkkke 


E 
USE 
E 


CONVERTING DECIMAL TO BINARY 


MC,D addresses the first byte of the string. 


The string must be stored high order low memory 
and terminate with any non-digit character. 
MC,D is unchanged. X contains 1st string byte. 


Y indexes 
M4, M5, M6 
C = 0: con 


M0,1,2,3 contains signed binary value. 


C = 1: ove 


string terminator. 
, M7 and A are changed. 
version successful. 


rflow has occurred. 


Arithmetic error if D = 1 (decimal mode). 


PAX Y 

2 

MO to M7, 
124 
Minimum: 2 


MC and MD 


03 


Maximum: 1650 (excluding any leading zeros). 


-discreet 
ereentrant 


*interruptable *promable 
trelocatable -robust 


SADB4Q 


SDBQ1 


LDY 
STY 
STY 
STY 
STY 


LDA 
TAX 
CMP 
BEQ 
CMP 
BNE 
BEQ 


ASL 
ROL. 
ROL 
ROL 
BMI 


ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
LDA 
ADC 
STA 
BMI 


sASCII number string. 


BIN 
BIN+1 


TMP+1 
BIN+2 


TMP +2 
BIN+3 


TMP +3 
OVFWQ 


sindex STR from zeroth byte. 
:Use Y = 0 to initially 
:clear result 
saccumulator. 


:Get first string byte and 
:save for later negate test. 
:Test if negative sign and 
:go get next byte if so. 
Test for positive sign and 
:g0 test digit validity if 
snot, else get next byte. 


:Shift partial 
sresult up by one bit: 
:BIN = last result * 2. 


:Exit if too big. 


:Add new digit 

:in to result so far 
:while transferring 
zit to temporary 
:storage accumulator. 
:TMP = last result * 2 
:+ new digit. 


sExit if too big. 


>4-byte store for binary result. 
:4-byte temporary storage. 


:2-byte stored address of first byte of 


14] 


CONVERTING DECIMAL TO BINARY 


ASL BIN :Shift partial 06 MO 
ROL BIN+1 zresult up by 26 M1 
ROL BIN+2 sone bit. 26 M2 
ROL BIN+3 :BIN = last result * 4. 26 M3 
BMI OVFWQ@ :Exit if too big. 30 37 
ASL BIN :Shift partial 06 M0 
ROL BIN+1 zresult up by 26 M1 
ROL BIN+2 sone bit. 26 M2 
ROL BIN+3 :BIN = last result * 8. 26 M3 
BMI OVFWQ :Exit if too big. 30 2D 
LDA BIN sAdd “last result * 2 A5 MO 
ADC TMP 7+ new digit" back 65 M4 
STA BIN :into BIN accumulator 85 MO 
LDA BIN+1 :giving A5 M1 
ADC TMP+1 :BIN = last result * 10 65 M5 
STA BIN+1 s+ new digit. 85 M1 
LDA BIN+2 : A5 M2 
ADC TMP+2 : 65 M6 
STA BIN#2 : 85 M2 
LDA BIN+3 : A5 M3 
ADC TMP+3 : 65 M7 
STA BIN+3 : 85 M3 
BVS OVFWQ :Exit if too big. 70 13 
SDBQ2_ INY :Index next string byte C8 
LDA (STR),Y :and get in A. B1 MC 
SDBQ3. SEC :Prepare to subtract. 38 
SBC #$30 :Strip ASCII digits hi-nib, E9 30 
CMP #$0A — stest if valid, go add in if C9 OA 
BCC SDBQ1 :so, else end of string. 90 A6 
CPX "-" :Test for negative start EO 2D 
BNE SDBQ4 :sign, skipping if not, DO 03 
JSR SNEG4 :else negate binary result. 20 lo hi 
SDBQ4 CLC :Set valid result flag, C = 0 18 
RTS sand exit, conversion done. 60 
OVFWQ SEC :Set string too big flag, 38 
RTS :C = 1 and exit on overflow. 60 


:J0B To convert an integer decimal value held as a 

: string of ASCII digits in memory and preceded by 
a "+" or "=" sign (or assumed positive if no 
sign) to a two's complement signed 32-bit number 
(+/- 2%*31 - 1) held in registers. 

Priority: length. 
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CONVERTING DECIMAL TO BINARY 


ACTION ON overflow: C€ Set overflow flag. Exit. ] 
: Clear 32-bit result accumulator. 
Read first string byte and save sign test flag. 
IF byte = "+" or "=" sign THEN: 
C Read next string byte. ] 
IF byte is ASCII digit THEN: 
C Convert digit to binary. 
REPEAT: 
C Result = result * 10 + digit. 
Read next string byte. 
IF byte is ASCII digit THEN: 
C Convert digit to binary. ] J] 
UNTIL non-digit terminator. ] 
Restore sign test flag. 
IF sign negative THEN: C Negate result. ] 
:CPU 6502 
sHARDWARE Memory containing ASCII decimal string. 
:SOFTWARE "SNEG4" - Routine to negate a 32-bit two's 
: complement signed integer held in M0 to M3. 
"DBLBIN" - Local subroutine to multiply result 
accumulator by 2. 
INPUT MC,D addresses the first byte of the string. 
The string must be stored high order low memory 
2 and terminate with any non-digit character. 
: OUTPUT MC,D is unchanged. X contains ist string byte. 
: | Y indexes string terminator. 
M4, M5, M6, M7 and A are changed. 
C = 0: conversion successful. 
M0,1,2,3 contains signed binary value. 
> C = 1: overflow has occurred. 
>ERRORS Arithmetic error if D = 1 (decimal mode). 
:REG USE PAX Y 
:STACK USE 2 
RAM USE MO to M7, MC and MD 
: LENGTH 99 
: CYCLES Minimum: 351 
: Maximum: 2878 (excluding any leading zeros). 
CLASS 2 -discreet *interruptable *promable 
to kkkA-- *reentrant -relocatable -robust 
STR = MC :2-byte stored address of first byte of 
sASCII number string. 
BIN = M0 :4-byte store for binary result. 
TMP = M4 :4-byte temporary storage. 
SADB4S LDY #0 sindex STR from zeroth byte. AO 00 
LDX #-4 :Count 4 bytes. A2 FC 
SDBS1 STY BIN+4,X :Use Y = 0 to initially 94 M4 
INX :clear result accumulator E8 
BNE SDBS1 sindexed by X. DO FB 
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‘ 


SDBS2 


SDBS3 


SDBS4 


SDBS5 


SDBS6 


SDBS7 


OVFWS1 


OVFWS2 
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LDA 
CMP 
PHP 
BEQ 
CMP 
BNE 
BEQ 


STA 
STX 
STX 
STX 


JSR 


LOX 
LDA 
ADC 
STA 
INX 
BNE 
BVS 


JSR 
JSR 


LDX 
LDA 
ADC 
STA 
INX 
BNE 
BVS 


INY 
LDA 


SEC 
SBC 
CMP 
BCC 


PLP 
BNE 
JSR 
CLC 
RTS 


PLA 
PLA 
PLP 
SEC 
RTS 


(STR) ,Y 


SDBS5 
A 

SDBS6 
SDBS5 


TMP 

TMP+1 
TMP+2 
TMP+3 


DBLBIN 
#-4 

TMP+4,X 
BIN+4 ,X 
TMP +4 ,X 


SDBS3 
OVFWS2 


DBLBIN 


DBLBIN 


#-4 

BIN+4,X 
TMP+4,X 
BIN+4,X 


SDBS4 
OVFWS2 


(STR) ,Y 


#$30 
#$0A 
SDBS2 


SDBS7 
SNEG4 


:BIN 
BIN 


:Get first string byte and 
:Test for negative sign, 
zsave test flag (Z), and 
:go0 get next byte if so. 
:Test for positive sign and 
:go test digit validity if 
znot, else get next byte. 


:Store new digit to temporary 
saccumulator and clear rest 
:of accumulator bytes by 

7X (= 0 after loops). 


:BIN = last result * 2. 


sIndex from low order bytes. 
zAdd "last result * 2" 

:to new digit 

:giving, 

:TMP = Last result * 2 

:+ new digit. 

:Exit if too big. 


last result * 4. 
last result * 8. 


:Index from low order bytes. 
zAdd "last result * 2 + new 
:digit" back into BIN 
saccumulator giving, 

:BIN = last result * 10 

s+ new digit. 

:Exit if too big. 


:Index next string byte 
sand get in A. 


:Prepare to subtract. 

:Strip ASCII digits hi-nib, 
stest if valid, go add in if 
:so, else end of string. 


:Restore "-" test flags and 
:skip if not negative number, 
selse negate binary result. 
:Set valid result flag, C = 0 
sand exit, conversion done. 


:DBLBIN overflow exit, clear 
zreturn address from stack. 
:Clear stacked "-" test flags. 
:Set string too big flag, 

:C = 1 and exit on overflow. 


lo hi 


Lo hi 


30 


CB 


03 
lo hi 


CONVERTING DECIMAL TO BINARY 


:...9ubroutine to multiply result accumulator by 2. 


DBLBIN ASL BIN :Shift partial 06 M0 
ROL BIN+1 sresult up by one bit: 26 M1 
ROL BIN#t2 :BIN = last BIN * 2, 26 M2 
ROL BIN+3 :Exit if too big through 26 M3 
BMI OVFWQ sroutine overflow exit, 30 F1 
RTS selse return normally. 60 


SBAD4, converting from binary to ASCII decimal, uses the inverse 
process of division. This is a far slower method in most cases but 
may occasionally be the better way to do the job. In general terms, 
converting from base-A to base-B, the method is to initially clear a 
base-B accumulator and then repeat the following sequence for the 
number of digits in the base-B accumulator. 


1 Shift base-B accumulator right by one base-B digit (i.e. divide 
base-B accumulator by base-B leaving a digit space at the top). 


2 Divide base-A number by base-B using base-A arithmetic to give 
an integer quotient and remainder. 


3 Store the remainder in the highest digit space at the top of the 
base-B accumulator. 


Operation (1), which generally is quite slow, is dealt with in SBAD4 
by pushing each newly found digit onto stack. The digits are pulled 
off stack and stored to memory in the correct order when the 
conversion 1s complete. 


Strictly speaking, the range of a 32-bit 2’s complement signed 
number should run from $80 00 00 00 (-2,147,483,648 decimal, or 
-23!) to $7F FF FF FF (+2,147,483,647 decimal, or 27! — 1) with bit 
31 giving the sign. However, 32 bits cannot hold the absolute value 
of the lowest figure (0 — (—2?!) = —23! to 32-bit precision) so it should 
be treated as either erroneous input or as overflow. 


:J0B To convert a two's complement signed 32-bit 
: number (+/- 2%*31 - 1) held in registers to an 
integer decimal value held as a string of ASCII 
digits in memory, preceded by "-" sign if 
negative, and terminated by an ASCII carriage 
: return code ($0D). 
:ACTION Address result string first byte. 
: IF number is negative THEN: 
C Write "-" to string first byte. 
Address next string byte. 
Get absolute value of number. ] 
Initialise stack count to 0. 
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CONVERTING DECIMAL TO BINARY 


REPEAT: 
C Digit = remainder (number / 10). 
Number = integer (number / 10). 
Push digit. 
Add 1 to stack count. ] 
UNTIL number = Q. 
FOR stack count: 
C Pull digit. 
Digit = digit + ASCII "0". 
Write digit to string. 
Address next string byte. J] 
Write ASCII carriage return string terminator. 
:CPU 6502 
:HARDWARE Memory to contain ASCII decimal string. 
:SOFTWARE "SNEG4" - Routine to negate a 32-bit two's 
: | complement signed integer held in MO to M3. 
: INPUT M0,1,2,3 contains the signed 32-bit value. 
: (M3 is the high order byte). 
MC,D addresses the first byte of the destination 
: area for result string. 
OUTPUT MC,D is unchanged. 
: Y indexes string terminator. 
: M0,1,2,5 = 0. MF, P, A and X are changed. 
> ERRORS Arithmetic error if D = 1 (decimal mode). 
REG USE PAX Y 
:STACK USE Maximum 10. 
:RAM USE MO M1 M2 M3 MC MD MF 
> LENGTH 71 
:CYCLES Minimum: 1096. Maximum: 11202. 


:CLASS 2 -discreet *interruptable *promable 

pT RKKKA *reentrant *relocatable -robust 

STR = MC s2-byte stored address of first byte of 

sdestination for ASCII number string. 

BIN = M0 s4-byte stored binary value. 

PTR = MF >For storing string pointer. 

SBAD4 LDY #0 sIndex result string byte 0. AO 00 
BIT BIN+3 sTest binary sign bit 31 24 M3 
BPL SBD1 sand skip if positive (= 0). 10 08 
LDA "-" :Else store an ASCII negative A9 2D 
STA (STR),Y :sign as string Ist byte 91 MC 
INY sthen index next byte, get c8 
JSR SNEG4 sabsolute value of number. 20 lo hi 

SBD1 STY PTR :Save string index to free Y. 84 MF 
LOX #0 :Set digits count to 0. A2 00 

SBD2 LDY #32 “sSet bit count for division. AO 20 
LDA #0 sclear division accumulator.  A9 00 
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SBD3 = ASL 


ROL 
ROL 
ROL 
ROL 


CMP 
BCC 
SBC 
INC 


SBD4 DEY 


BNE 


PHA 


BNE 


SBD5  ~=PLA 


CONVERTING DECIMAL TO BINARY 


BIN 

BIN+4 
BIN+2 
BIN+3 


#10 
SBD4 
#10 
BIN 


SBD3 


BIN+3 
BIN+2 
BIN+1 
BIN 
SBD2 


PTR 
"Q" 
(STR) ,Y 


SBD5 


#13 
(STR) ,Y 


:Shift number up by one bit 
smoving quotient in to 
sreplace dividend, next 
:dividend bit shifted 

sinto division accumulator. 


:Test if 10 will subtract, 
:skipping if not, else 
ssubtract 10 (decimal base) 
zand set quotient bit. 


:Repeat for all dividend bits, 
:remainder is next digit. 


:Push newly found digit and 
scount it for later pull. 
:Test all four bytes 

:0f binary number 

:for zero. If zero then 
zall digits on stack, else 
:go find next digit. 


:Restore string index to Y. 
>Get next digit from stack, 
sprepare to add. 

:Add ASCII "0" as base value 
sand store digit, then 
:index next string byte. 
sRepeat for all digits 
spushed in division loop. 


:Finally, terminate string 
swith ASCII carriage return 
sand exit, conversion done. 
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Converting Gray, 


16-bit and Others 





Here are a few of the more unusual and challenging ideas that 
have appeared in the columns of Sub Set. Gray Code is a lazy way 
of incrementing to the next number (you only have to change one 
bit!) and the other base conversions should come in useful when 
we evolve to a life form with 36 fingers. The 16-bit conversion is 
quite normal except for a handy trick with the overflow flag. 


16-BIT TO ASCIT DECIMAL 


PRAY by Andrew Watson converts the 16-bit unsigned number held 
in registers A and Y to a string of ASCII decimal digits. As in 
SBAD4, the 32-bit routine in the last chapter, PRAY stores the 
intermediate results on stack. The final result is not written to 
memory but sent to an output routine. Andrew has labelled the 
output subroutine CHROUT but this is merely a terminological 
convenience; any routine which outputs the ASCII character in A 
can be used by PRAY. 


PRAY’s conversion method is fundamentally the same as that used in 
SBAD4 but the implementation is interesting. Andrew has used 2 
bytes of stack, indexed by X, as workspace rather than the page zero 
pseudo-registers so use of PRAY is independent of the system 
software’s base page arrangements. The cost of such freedom is an 
extra 2 clock cycles for each rotation of each byte — possibly 3400 
cycles in total. 


Because A carries the last digit out of the conversion section of PRAY 
into the output section, it cannot be used to test if the number has 
reduced to zero by the normal method of LDA BYTE1: ORA 
BYTE2: BNE NXTDG. The method used by Andrew relies on the 
fact that “SBC #10’ is the only instruction in the ROLL loop to affect 
the overflow flag. As the subtraction is never performed unless A is 
10 or more, overflow cannot occur and V is always cleared. So, by 
setting V prior to the loop, BVC NXTDG will branch only when a 
subtraction has occurred in the loop, a quotient bit has been set and 
the number is not zero. 
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sACTION 


> CPU 
sHARDWARE 
7SOFTWARE 


>REG USE 
>STACK USE 
RAM USE 
> LENGTH 
CYCLES 


CLASS 2 
akkkKAO 


NXTDG 


ROLL 
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PHA 


SBC 


ROL 
CMP 
BCC 
SBC 


To convert a 16-bit unsigned number held in 
registers to ASCII decimal, with leading zeros 
Suppressed, and send the result to an output 
routine. 

Initialise digit as end-marker. 


REPEAT: 
{C Push digit. 
Digit = remainder (number / 10). 
Number = integer (number / 10). ] 
UNTIL number = QO. 
REPEAT: 


C Digit = digit + ASCII "0". 
Call output routine. 
Pull digit. ] 

UNTIL digit = end-marker. 


None. 

"CHROUT" - a subroutine to output the ASCII 
character in A (to printer, screen memory, 
remote terminal, etc.) without changing other 
register values. 


A (high byte) and Y contain number to be output. 


A= 0. Y = -1 (unless changed by CHROUT). 
Arithmetic error if D = 1 (decimal mode). 
AY 


(Minimum 8, maximum 11) + CHROUT stack use. 
None. 

49 

Approximately 44 + digits * (488 + CHROUT time). 


xpromable 
-robust 


*xinterruptable 
krelocatable 


~discreet 
kreentrant 


:Save flags. 


08 
:Push A and Y to store number 48 
:on stack, high order 98 
sbyte in higher memory. 48 
:Save X, for use 8A 
sindexing stack 2 bytes 48 
:below stacked number. BA 
#$80 :Push end-marker ist time. AY 80 
:Push last digit (or marker). 48 
#16 :Count for 16 bits. AO 10 
:Subtract to set overflow flag, 38 
#0 :cleared only on non-zero AY 00 
#$80 :quotient in division. E9 80 
A :(ist time clears A). Rotate 2A 
#10 snext number bit into remainder C9 OA 
NBIT >in A, if >= 10 then subtract, 90 02 
#10 sset result bit C, clear V. E9 OA 
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NBIT ROL $0102,X :Shift result bit into number 3E 02 01 
ROL $0103,X sand next number bit to Carry. 3€ 03 01 
DEY :Repeat 16 times for 16 bits 88 
BPL ROLL safter ist time, 17 in all. 10 FO 


BVC NXTDG :Another digit if number > 0. 50 E6 


PRNTDG ORA #$30 :Convert digit to ASCII 09 30 


JSR CHROUT :decimal and go output it. 20 lo hi 
PLA :Pull next byte and repeat if 68 

BPL PRNTDG :digit, else end-marker $80. 10 F8 
PLA :Restore X. 68 

TAX : AA 

PLA :Remove number from stack, 68 

PLA sleaving A = 0. 68 

PLP :Restore flags and exit, 28 

RTS :number output. 60 

OTHER BASES 


XBIN, by Dennis May, and BINX, an amalgamation of routines by 
Dennis May and Vincent Fojut, convert between 32-bit binary and 
numbers in other bases, from 2 to 36, stored as ASCII digits. 
Hexadecimal uses the upper-case letters A to F to stand for the digits 
following on from 9 and the idea is extended to use all the upper-case 
letters. 


You can use the method to convert between binary and numbers up 
to base 62 by bringing the lower-case letters into play. However, 
XBIN and BINX check only for the discontinuity between ASCII ‘9’ 
and ASCII ‘A’ which is used to codify the symbols, :, 3, <, =, >, ? 
and a. There is another break between Z and a for the six symbols, 
C,\, 1, t, - and‘. To jump it you will need to insert code for the 
appropriate test and corresponding addition or subtraction. 


:J0B To convert an unsigned number, of any base 2 to 
36, held in memory as ASCII digits and upper 
case letters to a 32-bit binary number held in 

: registers or base-page "pseudo-registers". 

ACTION Clear result. 

: Get first character. 

ON overflow: C Set overflow flag & exit. J 
WHILE character NOT terminator: 
C Convert character to binary coded digit. 
Result = result * base + digit. 
Index and get next character. ] 
Set conversion completed flag. 
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:CPU 6502 
sHARDWARE Memory containing ASCII number string. 
>SOFTWARE None. 
> INPUT M4,5 addresses 1st byte of ASCII string which 
: must terminate with $0D (carriage return). 
: M6 contains the number base. 
: OUTPUT Registers changed. M7 to MC changed. 
: M4 to M6 not changed. 
C = 0: conversion completed. 
32-bit result in MO to M3 (M3 is high order). 
C = 1: overflow during process. 
: MO to M3 is indeterminate. 
:ERRORS No test is made for non-upper-case alphanumeric 
: characters in ASCII string. 
No test is made for digits greater than base. 
: Arithmetic error if D = 1 (decimal mode). 
:REG USE PAX Y 
:STACK USE 2 
RAM USE MO to MC 


: LENGTH 106 

:CYCLES Not given. 

:CLASS 2 -discreet *interruptable *promable 

pokkke- xreentrant -relocatable -robust 

ASCN = M4 :Stored address of ASCII string. 

BASE = Mé :Stored ASCII number base (2 to 36). 

BTIMP = MB :Storage for working BASE. 

RSLT = MO :4-byte result location (low byte). 

RTMP = M7 :Storage for working RSLT (low byte). 

INDX = MC :Storage for ASCII string pointer. 

XBIN LDY #0 :Clear for RSLT clear. AO 00 
LDX #4 sIndex for RSLT 4 bytes. A2 04 

XBIN1 DEX sIndex RSLT next byte CA 
STY RSLT,X sand clear it, repeat 94 M0 
BNE XBIN1 suntil RSLT clear. X = 0. DO FB 
STX INDX sInitialise ASCII index to 0. 86 MC 

XBIN2 LDY INDX sIndex current ASCII byte A4& MC 
LDA (CASCN),Y sand pick it up. B1 M4 
CMP #$0D If ASCII "carriage return" C9 OD 
BEQ END sterminator then completed. FO 52 
SEC -Strip ASCII digits high 38 
SBC #$50 snibble and test for if E9 30 
CMP #S$0A :greater than digit 9, C9 OA 
BCC ASCY sadjusting for gap between 90 02 
SBC #7 2"9" and "A" if it is. E9 07 

ASCY PHA :Save new digit. 48 
LDA #0 sClear for RTMP clear. A9 00 
LDX #4 sIndex for RTMP 4 bytes. A2 04 

XBIN3 DEX sIndex RTMP next byte CA 
STA RTMP,X sand clear it, repeat 95 M7 
BNE XBIN3 suntil RTMP clear. DO FB 
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XBIN4 


XBINS5 


XBIN6 


XBIN? 
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LDA 
STA 
LDY 


ASL 
PHP 
ASL 
ROL 
ROL 
ROL 
BCS 
PLP 
BCC 
LDX 
CLC 


LDA 
ADC 
STA 
INX 
BNE 
BCS 


DEY 
BNE 


CLC 


PLA 


ADC 
STA 
LDX 


LDA 
ADC 
STA 
INX 
BNE 
BCS 


INC 
JMP 


BASE 
BTMP 
#8 


BTMP 


RTMP 
RTMP+1 
RTMP+2 
RTMP+3 
OVFW1 


XBIN6 
#-4 


RSLT+4,X 
RTMP+4,X 


XBIN5 
OVFWe 


XBING 


RTMP 
RSLT 
#-3 


RTMP+4,X 
#0 
RSLT+4,X 


XBIN? 
OVFW3 


INDX 
XBIN2 


:Move base to temp byte for 
zuse as multiplier. 
:Count for 8-bit multiplier. 


:Shift next multiplier bit 
sinto C and save it. 

:Shift left partial product 
:for possible addition at 
‘next bit place. 


:Skip out if product too big. 
:Get multiplier bit to C and 
:skip if 0, no add this place. 
:Else index from low bytes. 
:Clear for low bytes add. 


:Add multiplicand byte to 
spartial product. 

RTMP+4,X : 

sIndex next and repeat for 
zall four bytes. 

:Out if product too big. 


:Repeat for all 8 bits of 
:multiplier (base). 


:Clear for add. 

:Get new digit and add to 
sproduct low byte, result to 
spartial conversion result. 
:Index from byte 2. 


:Move other three product 
sbytes to conversion result 
:adding in any carry from 
:digit add in to low byte and 
:subsequent carries. 

:Out if result too big. 


>Index next ASCII byte and 
scontinue conversion. 


:Flag conversion complete 
7(C = 0) and exit. 


sLose multiplier bit. 
:Lose digit. 
zExit (C = 1) on overflow. 


M4 
MB 
F7 
1E 


DF 


M7 


FD 


M4 


F7 
09 


MC 
lo hi 


To convert a 32-bit binary number held in 
registers or base-page "pseudo-registers" to an 


unsigned number, of any base 2 to 36, held in 
memory as ASCII digits and upper-case letters. 
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>ACTION Push terminator byte. 
: REPEAT: 
{C Digit = remainder (number / base). 
Number = integer (number / base). 
IF digit < 10 
THEN: C Digit = digit + ASCII "0". ] 
ELSE: EC Digit = digit + ASCII "A" - 10. ] 
Push digit. ] 
UNTIL number = 0. 
Address destination. 
REPEAT: 
C Pull byte and store to destination. 
Address next destination. J 
UNTIL terminator byte stored. 
> CPU 6502 
sHARDWARE Destination RAM for conversion result. 
-SOFTWARE None. 
> INPUT M0,1,2,3 contains unsigned 32-bit number. 
: (High order byte in M3). 
M4,5 contains address of destination for result. 
: MD contains conversion base (2 to 36). 
:OUTPUT Conversion result at address in M4,5 (high order 
: byte first) terminated by ASCII carriage return. 
M4 M5 & MD are unchanged. 
MO = M1 = M2 = M3 = 0. A = $OD. Carry = 0. 
: Y indexes terminator (Y= result string length). 
:ERRORS Arithmetic error if D = 1 (decimal mode). 
:REG USE PAY 
>STACK USE Minimum: 2. Maximum: 33. 
:RAM USE MO M1 M2 M3 M4 M5 MD 


: LENGTH 55 

:CYCLES Minimum: 1127. Maximum: 38538. 

:CLASS 2 -discreet *interruptable *promable 

rokkkk= kxreentrant *relocatable -robust 

BNO = M0 74-byte stored binary number. 

ANOA = M4 :2-byte stored address for first byte 

sof ASCII base 2-36 result. 

BAS = MD :Stored base, $02 to $24 (2 to 36). 

ABAS = $30 sASCII "O" - Lowest converted value. 

BINX LDA #13 :Push ASCII carriage return code A9 OD 
PHA son stack as stack base. 48 

REMLP LDY #32 :Count for 32 bits in DIVLP. AO 20 
LDA #0 :Clear division accumulator. A9 00 

DIVLP ASL BNO :Shift dividend/quotient 06 MO 
ROL BNO+1 sup by one bit. Quotient shifts 26 M1 
ROL BNO+2 sin to replace dividend. 26 M2 
ROL BNO+3 :Next dividend bit is rotated 26 M3 
ROL A sinto remainder-accumulator 2A 
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CMP BAS :Test if base can be subtracted C5 MD 
BCC DIVLPT :skipping if not, else 90 04 
SBC BAS :subtract base from remainder E5 MD 
INC BNO zand set quotient result bit. E6 MO 
DIVLPT DEY :Repeat for all dividend bits. 88 
BNE DIVLP :Remainder = next digit. Y = 0. DO EC 
CMP #10 -If remainder-digit is in range (C9 OA 
BCC ASCBAS :0 to 9 then okay so skip, else 90 02 
ADC #6 sadd 7 (C = 1) for range A to Z. 69 06 
ASCBAS ADC #ABAS sASCII starts at $30 = "0". 69 30 
PHA :Save digit in order on stack. 48 
LDA BNO :Test quotient from last A5 M0 
ORA BNO+1 :division (= dividend for next 05 M1 
ORA BNO+2 :division) to see if reduced 05 M2 
ORA BNO+3 sto zero (= conversion done) 05 M3 
BNE REMLP sand repeat until so. DO D5 
STORE PLA :Move one stacked digit to 68 
STA (CANOA),Y sdestination. (Y = 0 at start.) 91 M4 
INY :Index next destination place. C8 
CMP #ABAS :If last byte >= lowest digit C9 30 
BCS STORE :then repeat, else carriage BO F8 
RTS :return so end, string stored. 60 


You can use the two routines in sequence to convert from one 
obscure, ASCII encoded base to another. With the address of the 
ASCII number string in M4 and MS, the current number base in M6 
and the required number base in MD, a call to BCNV will write the 
result string to the same location as the current string. 


BCNV JSR XBIN :Convert to 32-bit binary 20 lo hi 
BCS OVFWX sbut exit, C = 1, on overflow. BO 03 
JMP BINX :Go convert to new base, BINX 4C lo hi 
sexits with C = 0 = no overflow. 
OVFWX RTS :Exit on XBIN overflow. 60 
BINARY TO GRAY CODE 


The process of incrementing or decrementing a binary number may 
involve just a single-bit change, such as in the step from 6 to 7 
(%0110 to %0111). Usually, however, more than one bit changes 
state and in 4-bit wraparound arithmetic as many as all 4 bits can 
change when %1111 (decimal 15) increments to %0000 (decimal 0, 
or 16). | 


In some applications the individual bits may not be packed together 
in a single byte. Accessing all of them could be a process that simply 
takes too long. In other cases an increment might have to be carried 
out by a masking operation — difficult if some bits have to be cleared 
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and others set. Reading rapidly changing binary numbers one bit at a 
time could produce an out of sequence value if several bits have 
changed state during the access. 


Binary code is obviously not ideal for any of those situations but 
there is a type of code in which consecutive numbers differ by only 
one bit. Gray Code is used where speed and ease of change 1s 
important and where misreading the value can prove disasterous. It 
is often used in programming revolving mechanisms — which can be 
anything from the stepper motor on a plotter to a battleship gun 
turret. 


Several sequences meet the criterion of a single-bit change between 
consecutive numbers. The following table gives just one of the 
possible 4-bit sequences. 


Decimal Hex Binary Gray Code 
0 0 0000 0000 
1 1 0001 0001 
2 2 0010 0011 
3 3 0011 0010 
4 4 0100 0110 
5 5 0101 0111 
6 6 0110 0101 
7 7 0111 0100 
8 8 1000 1100 
9 9 1001 1101 

10 A 1010 1111 

11 B 1011 1110 

12 C 1100 -1010 

13 D 1101 1011 

14 E 1110 1001 

15 F 1111 1000 


GYCON by Vincent Fojut converts a 4-bit binary value to 4-bit Gray 
Code by means of a look-up table. Vincent supplied it to Sub Set at 
the same time as his GETLOC (see chapter 5) which 1s called by 
GYCON to get the address of a known point in the program into 
MO,1. The Gray Code value is then picked up by program relative 
addressing (masquerading as post indexed indirect addressing). 


= GYCON Binary to Gray Code conversion. 

:J0B To convert a 4-bit binary value to 4-bit Gray 
: Code using a look-up table. 

:ACTION Set pointer to start of look-up table. 


Add binary value to pointer. 
Read code addressed by pointer. 
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:CPU 6502 

sHARDWARE None. 

:SOFTWARE "GETLOC" - A subroutine to return in MO,1 the 

: address of the instruction immediately following 
"JSR GETLOC". 

> INPUT Y = 4-bit binary value. 

:OUTPUT A = 4-bit Gray Code. 

: MO M1 Y and P are changed. 

sERRORS The table will be exceeded if Y > 15. 

: Arithmetic error if D = 1 (decimal mode). 

:REG USE PAY 


:STACK USE 2 

>RAM USE MO M1 

: LENGTH 26 

CYCLES 48 (including GETLOC cycles). 

:CLASS 2 -discreet *interruptable *promable 

Sr RKEKM xreentrant *relocatable -robust 

GYCON JSR GETLOC -Get address of 4th byte of 20 lo hi 
TYA :GYCON in MO,1. Add binary 98 
ADC #7 :value to routine bytes - 3 69 07 
TAY :as index to table entry. A8 
LDA (M0),Y :Get Gray Code equivalent B1 M0 
RTS sand exit. 60 
-BYTE $00,$01 :Look-up table 00 01 
-BYTE $03,802 sof 4-bit 03 02 
-BYTE $06,$07 :Gray Code 06 07 
-BYTE $05,$04 :equivalents 05 04 
-BYTE $0C,$0D :of binary OC OD 
-BYTE $0F,$0E :values OF OE 
-BYTE $0A,$0B :0 to 15 OA OB 
~-BYTE $09,308 :in ascending order. 09 08 


COMPUTED GRAY CODE CONVERSION 


Using a look-up table to convert from binary to Gray Code is quick 
but unnecessarily lengthy. A computed conversion can be quicker 
for small values and is definitely shorter for 8-bit values where a 
256-byte look-up table would be needed. In fact the conversion need 
not be computed at all but can be done by specialised hardware 
circuits. BGCB is a Sub Set routine of mine which emulates the 
hardware logic. 


Each Gray Code bit is the complement of the corresponding binary 
value bit if the next higher binary bit is set. So the conversion 
method only needs to shift the partial result one-bit lower and 
exclusive-OR it with the original value. Try it out with several values 
from the table above. 
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Conversion of a 4-bit or even an 8-bit binary value to Gray Code by a 
computed method is hardly worth the bother of writing as a 
subroutine. It is only a 5-byte, 8 clock cycle sequence, easily written 
Straight into any program: 


STA MO :Save 4-bit or 8-bit binary. 85 M0 
LSR A :Shift binary down 1 bit and 4A 
EOR MO :complement if next higher bit set. 45 MO 


What does make it worth writing as a subroutine is the fact that if the 
process is repeated for a count one less than the value bit length, it 
will convert back from Gray Code to binary. BGCB will convert both 
4-bit and 8-bit codes from binary to Gray Code on an input count of 
1 and from Gray Code to binary on counts of 3 or 7. I don’t know 
what it produces on other counts. 


= BGCB Convert between Gray Code and binary. 
:J0B To convert from a binary value to Gray Code or 
: from Gray Code to binary, using either 4-bit or 


: 8-bit values in either case. 
:ACTION FOR input count: 

: C Result = (result / 2) EOR (input value). J 

> CPU 6502 

> HARDWARE None. 

> SOFTWARE None. 


INPUT Y = 1. 4-bit or 8-bit binary value in A. 
Y = 3. 4-bit Gray Code in A. 

: Y = 7. 8-bit Gray Code in A. 

OUTPUT Converted value in A. 

: P, X and Y are changed. 

sERRORS No check made for a valid count input. 


“REG USE PAX Y 
:STACK USE 1 
sRAM USE None. 


: LENGTH 11 
:CYCLES Binary to Gray Code: 23. 
: Gray Code to 4-bit Binary: 45. 


Gray Code to 8-bit Binary: 89. 


CLASS 2 -discreet *interruptable *promable 
Pika KK- treentrant *relocatable -robust 
BGCB TSX :Index stack workspace where BA 
PHA :input value is stored. 48 
BGCBLP LSR A :Exclusive-or half previous 4A 
EOR $0100,X :result with stacked input 5D 00 01 
DEY :code 1, 3 or 7 times, 88 
BNE BGCBLP :depending on input count. DO F9 
TXS | Tidy stack. Exit with Gray 9A 
RTS :from binary or vice versa. 60 
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Arithmetic: 
16-bit 





The 6502 is the poor relation of the 8-bit processor family. It has 
no 16-bit registers on which to perform higher precision arithme- 
tic. The routines in this chapter show how to use the few 8-bit 
registers that are available to it for binary multiplication and 
division and decimal halving on 2-byte values. 


A BCD EXERCISE 


Binary Coded Decimal digits are normally stored 2 to a byte. They 
utilise only the first 10 of the 16 values that can be coded 1n 4 bits — 
that is, %0000 to %1010, leaving %1011 to %1111 unused. The 6502 
will perform decimal addition or subtraction on BCD values if the 
decimal mode flag D is set. 


Dividing BCD numbers by 2 isn’t achieved easily on the 6502. 
Hexadecimal halving is just a matter of right shifting by one bit with 
any carry between digits having the correct value of 8 (half of 16). 
‘Right shifting a BCD value still gives a value of 8 to any inter-digit 
carry. This has to be tested for and adjusted to the correct decimal 
value of 5 by subtracting 3. 


When Sub Set issued a challenge for the shortest Z-80 code to halve a 
4-digit BCD value in the 16-bit HL register-pair there was quite a 
large response ranging in length from an anonymous contribution at 
81 bytes (!) to the shortest, my HLFHL, at 21 bytes. Before settling 
for HLFHL, I had spent a not insignificant number of hours 
investigating all the- possible methods of dividing a 4-digit BCD 
number by 2 (and some impossible ones) so I must admit to being 
rather annoyed with myself for not spotting an improvement that 
reduced the length of HLFHL to 20 bytes. The person responsible for 
my ego deflation was Gavin Every. Gavin is also responsible for this 
neat 6502 version. 


HLFXY, as its name suggests, halves the BCD value held in the index 
registers X and Y. With the need to transfer the bytes to and from 
the accumulator for the shifts and tests, it is very compact at 27 
bytes. The only imperfection in the routine is that it doesn’t test for 
valid BCD input — presumably because the calling routine 1s 
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expected to be working only on BCD values — and uses binary 
arithmetic without first clearing the decimal mode flag, which surely 
must be set on entry if BCD work is being done. 


= HLFXY Halve 4-digit BCD string in registers. 
:J0B To divide a register held string of 4 BCD digits 
by two, setting carry for remainder. 
ACTION Halve by right shifting one bit. 
Adjust any carry from value 8 to value 5. 
:CPU 6502 
>HARDWARE None 
>SOFTWARE None 
INPUT X and Y together contain a 4-digit BCD number. 
: (High order digit in high-nibble X.) 
OUTPUT X and Y together contain the BCD result of 
: input XY divided by 2. Carry contains remainder. 
>ERRORS Arithmetic error if D = 1 (decimal mode) !! 
: No check for valid BCD input. 
>REG USE PX Y 
:STACK USE 1 
>RAM USE None. 
> LENGTH 27 
CYCLES Minimum: 38. Maximum: 49. 
CLASS 2 *discreet *interruptable *promable 
pekkKK- *xreentrant *relocatable -robust 
HLFXY PHA :Save A. 48 
TXA :Move high pair of digits to 8A 
LSR A :A and divide by 2, then move 4A 
TAX shalved pair back to X. AA 
AND #$08 Without affecting Carry flag, 29 08 
BEQ LOGET :skip if no half carry, else FO 03 
DEX :subtract 3 to convert carry CA 
DEX >from thousands worth 8 hundreds CA 
DEX :into carry worth 5 hundreds. CA 
LOGET TYA :Get low pair in A and skip if no 98 
BCC LOHALF :Carry from hundreds, else add 10 90 02 
ADC #S$9F sto tens (worth 5 when halved). 69 9F 
LOHALF ROR A :Divide by 2, getting tens right 6A 
TAY sand move pair back to Y. A8 
AND #$08 :Without affecting Carry flag, 29 08 
BEQ EXIT :skip if no half carry, else FO 03 
DEY :subtract 3 to convert carry 88 
DEY :from tens worth 8 units 88 
DEY zinto carry worth 5 units. 88 
EXIT PLA :Restore A and exit with BCD 68 
RTS :value halved and remainder in C. 60 


ARITHMETIC: 16-BIT 


BINARY DIVISION 


DUBDIV by Tim Groves was the very first 6502 datasheet to use the 
pseudo-register set, MO to MF, and did so in the issue that marked 
the start of Sub Set’s second year. 


The normal method of long division is used in DUBDIV and superbly 
demonstrates the superiority of working in binary values. In decimal 
division the divisor may have to be subtracted from the dividend up 
to nine times at each digit place. In binary there is a maximum of one 
subtraction at each bit place. 


sACTION 


>CPU 
> HARDWARE 
>SOFTWARE 


To divide one 16-bit unsigned integer by another 
and return the 16-bit unsigned integer quotient 
and 16-bit remainder. 
Clear remainder and quotient accumulators. 
FOR count of dividend bits: 
C Dividend = dividend * 2. 

Carry = dividend overflow bit. 

Remainder = remainder * 2 + Carry. 

IF remainder >= divisor: 

THEN: C Remainder = remainder - divisor. 

Carry = 1. ] 
ELSE: C Carry = 0. J] 


Quotient = quotient * 2 + Carry. J] 
6502 
None 
None 
M2,3 contains the dividend. 
M4,5 contains the divisor. 
M0,1 contains the integer quotient. 


:OUTPUT 


ERRORS 
REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


a 

m 

= 
| | 


and A,Y contains the remainder. 

changed. 

If M4,5 = 0 then the quotient in MO,1 = SFFFF 
and the remainder = input dividend. 

No check for division by zero. 


PAY 

2 

MO M1 M2 M3 M4 M5 

58 ) 

Minimum: 887. Maximum: 1191. 

-discreet *interruptable *promable 
*reentrant *relocatable -robust 

MO :2-bytes for quotient output. 
M2 :2-bytes for remainder output. 
M2 :2-byte stored dividend input. 
M4 :2-byte stored divisor input. 
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ARITHMETIC: 16-BIT 


DUBDIV TXA >Save X for use as bit count. 8A 
PHA : 48 
CLD :Ensure binary subtractions. D8 
LDA #0 :Clear remainder-accumulator A9 00 
TAY shi-byte and lo-byte. A8 
STA QUOT+1 :Clear quotient hi-byte 85 M1 
STA QUOT sand lo-byte. 85 MO 
LDX #16 :Set count for 16 dividend bits. A2 10 
NXTBIT ASL DVND :Shift dividend up one bit, 06 M2 
ROL DVND+1 ‘:moving next bit out to Carry. 26 M3 
PHA :Save accumulator hi-byte 48 
TYA :and enable accumulator lo-byte. 98 
ROL A >Shift remainder up, getting new 2A 
TAY sdividend bit. Save lo-byte A8 
PLA :and re-enable hi-byte. 68 
ROL A :Complete accumulator shift up. 2A 
CMP DVSR+1 :Will divisor subtract? C5 M5 
BEQ CHEKLO :Test lo-bytes if hi-bytes equal. FO OC 
BCC QBIT >No divide in at this bit place. 90 OE 
SUBDIV PHA :Result will be positive. Save 48 
TYA saccumulator hi-byte, enable 98 
SBC DVSR :lo-byte and subtract divisor E5 M4 
TAY :lo-byte (C = 1 from CMP or CPY). A8 
PLA :Re-enable hi-byte and subtract 68 
SBC DVSR+1 :divisor. Non-negative result E5 M5 
BCS aQBIT 7so C = 1 always. BO 04 
CHEKLO CPY DVSR :Will subtraction give negative C4 M4 
BCS SUBDIV’ :result? Subtract only if not. BO F2 
QBIT ROL QUOT :Rotate C in, writing quotient 26 M0 
ROL QUOT+1 s:result bit at correct bit place. 26 M1 
DEX :Repeat for all 16 bits CA 
BNE NXTBIT :of dividend. DO DB 
STA REM+1 :Store remainder to page 85 M3 
STY REM :zero output variable. 84 M2 
PLA :Restore X from stack 68 
TAX svia A. AA 
LDA REM+1 :Recover remainder to AY. A5 M3 
RTS sExit, division completed. 60 
BINARY MULTIPLICATION 


No corresponding multiplication routine accompanied DUBDIV in 
Sub Set but since the process is no less essential than division for 
16-bit work, I include one here. DUBMUL is compatible with 
DUBDIV, using the same 6-byte block of page zero pseudo-registers 
and returning part of the result in registers A and Y. 
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:ACTION 


CPU 
>HARDWARE 
:SOFTWARE 


> OUTPUT 


ERRORS 
REG USE 
:STACK USE 
RAM USE 
> LENGTH 
CYCLES 


= 

i» ] 

rm 

P°] 
fom ont 


DUBMUL TXA 
PHA 
CLD 
LDA 
TAY 
STA 
STA 
LOX 


NEXTBT ASL 
ROL 
PHA 
TYA 
ROL 
TAY 
PLA 
ROL 


ASL 
ROL 
BCC 


ARITHMETIC: 16-BIT 


To multiply two 16-bit unsigned integers and 
return the 32-bit unsigned integer product. 
Clear product accumulators. 
FOR count of multiplier bits: 
C Product = product * 2. 

Multiplier = multiplier * 2. 

Carry = multiplier overflow bit. 

IF Carry = 1 THEN: 

C Product = product + multiplicand. J] ] 


M2,3 contains the multiplier. 
M4,5 contains the multiplicand. 
M0,1,2,3 contains the product. 
A,Y contains the product highest two bytes. 
P is changed. 

None. 

PAY 

2 

MO M1 M2 M3 M4 M5 

62 

Minimum: 727. Maximum: 1271. 


-discreet *interruptable *promable 
*reentrant *relocatable *robust 


M0 :4-bytes for product output. 
M2 :2-byte stored multiplier input. 
M4 :2-byte stored multiplicand input. 
:Save X for use as bit count. 8A 
: 48 
z:Ensure binary additions. d8 
#0 :Clear product-accumulator A9 00 
zhighest two bytes A and Y, and A8 
PROD+1 :lowest two bytes, M1 85 M1 
PROD z:and MO. 85 M0 
#16 :Set count for 16 multiplier bits. A2 10 
PROD :Shift product up one bit, 06 M0 
PROD+1 :moving next bit out to Carry. 26 M1 
:Save product hi-byte and 48 
:enable product next highest byte. 98 
A :Shift product byte up one bit 2A 
:Save next highest byte and A8 
zre-enable hi-byte. Full product 68 
A sshift up to next bit place. 2A 
MPLR :Shift multiplier up one bit, 06 M2 
MPLR+1 :moving next bit out to Carry. 26 M3 


ENDTST :Skip if no add this bit place. 90 16 
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ARITHMETIC: 16-BIT 


CLC :Prepare to add, no carry in. 18 
PHA :Save product hi-byte. 48 
LDA PROD Add multiplicand Lowest A5 MO 
ADC MCND :byte to product lowest 65 M4 
STA PROD zbyte. 85 M0 
LDA PROD+1 :Add multiplicand next Lowest A5 M1 
ADC MCND+1 :byte to product next lowest 65 M5 
STA PROD+1 :byte. 85 M1 
PLA >Re-enable product hi-byte. 68 
BCC ENDTST :Skip if no carry to hi-bytes. 90 05 
INY :Add in carry by increment which C8 
| sdoesn't affect Carry flag. 
BNE ENDTST :Skip if no carry to hi-byte, DO 02 
ADC #0 zelse use C = 1 from lo-bytes. 69 00 
ENDTST DEX :Repeat for all 16 bits CA 
BNE NEXTBT :of multiplier. DO D7 
STA PROD+3 :Store product highest two bytes 85 M3 
STY PROD+2 :to page zero output variable. 84 M2 
PLA :Restore X from stack 68 
TAX via A. AA 
LDA PROD+3 :Recover product hi-byte to A. AS M3 
RTS Exit, multiplication completed. 60 
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17 


Arithmetic: 
32-bit 


A 32-bit (4-byte) number can have any value between 0 and 222 - 1 
(4,294,967,295 or $FF FF FF FF), if unsigned or absolute 
arithmetic is used. In 2’s complement notation, the usual method 
of dealing with sign in integer arithmetic, 32 bits have a range of 
—231 (-2,147,483,648 or $80 00 00 00) to 23! — 1 (2,147,483,647 or 
$7F FF FF FF). Bit 31, the highest bit, shows the sign of the 
number: 0 is positive and 1 is negative. A 2’s complement number 
can be sign-changed (i.e. switched from positive to negative, or 
vice versa) by subtracting it from zero (negation). An alternative 
method is to change the state of each individual bit (complement) 
and then add 1. 





A SIGNED ARITHMETIC SUITE 


The 5 routines included here form a complete suite for performing 
simple integer arithmetic on 32-bit signed numbers. Along with the 
32-bit conversion routines in chapter 14, which convert 32-bit values 
to and from signed ASCII decimal, and routines for keyboard input 
and screen display (both of which should be in your system 
software), you have the basis of a calculator working to 9 decimal 
digits precision. 


The initial 6502 work on this suite was by Dennis May who provided 
Sub Set with the six routines, SBAD4 and SADB4 (see chapter 14), 
SNEG4, ABS4, SMUL4 and SDIV4. These were improved by Vincent 
Fojut to the 8 routines, SBAD4, SADB4Q and SADB4S (given in 
chapter 14), SNEG4, ABS4, ABSNEG, SMUL4 and SDIV4. 


INTERFACING 


The suite has a standard Input/Output interfacing protocol. It uses 
the full block of 16 page zero pseudo-registers, named MO to MF for 
Sub Set Datasheet purposes (see Introduction) with the following 
variable assignments: 


MO M1 M2 M3 
Primary 32-bit accumulator (AZA).Used for the binary repre- 


sentation in the conversion routines, the value to be negated in 
the negate routine and for output of product or quotient. 
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ARITHMETIC: 32-BIT 


M4 M5 M6 M7 
Secondary 32-bit accumulator (AZB). Used for input of 
multiplier or dividend. 


M8 M9 MA MB 
Tertiary 32-bit accumulator (AZC). Used for input of multipli- 
cand or divisor and for output of remainder. 


MC MD 
String Pointer. Used for the address of the source or 
destination ASCII decimal digit string in the conversion 
routines. 


ME and MF 
Temporary storage locations. Used for storing index register 
contents and product, quotient and remainder signs. 


In all cases the low-order byte of an address or value is stored in the 
lowest memory location of the variable. 


Errors are flagged by setting Carry — clearing Carry for a successful 
operation — in any routine where invalid input is possible. 


ABSOLUTE AND NEGATIVE VALUES 


ABSNEG doesn’t properly belong in the suite: it returns the absolute 
value of a 32-bit number at 4 page zero locations addressed by X; the 
other members of the suite all act on variables within the 16-byte 
block. ABS4 calls ABSNEG twice to convert the secondary and 
tertiary accumulators to their absolute values after first calculating 
the sign of their product or quotient by exclusive-ORing the 2 sign 
bits. Within the suite, ABSNEG is merely a local subroutine of ABS4; 
outside it could be very useful in many contexts. 


SNEG4 performs much the same operation of subtracting a 32-bit 
value from 0 as does ABSNEG but does it whatever the sign of input 
value. It also moves the result sign into the Carry flag but in doing so 
it unfortunately destroys the information about the zero status of the 
negated variable; the Carry out of the negation is set only if the 
variable is zero. 


:J0B To return the absolute value of a 32-bit signed 
: integer using two's complement notation. 
:ACTION IF number is negative THEN: 


{C Index from low order byte. 
Ensure no borrow in. 
FOR count of 4: 
C Accumulator = 0 - indexed byte. 
Indexed byte = accumulator. 
Index next byte. ] ] 
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ARITHMETIC: 32-BIT 


> CPU 6502 
>HARDWARE None. 
=SOFTWARE None. 


: INPUT 4 page zero locations at 0,X to 3,X contain the 
: number. 
:OUTPUT X = input X + 4. 


4 page zero locations at -4,X to -1,X contain 
the absolute value of the input number. 
: A = number hi-byte. Y = Q. P is changed. 
>ERRORS The absolute value of $80000000 cannot be 
: represented in 32-bit 2's complement notation. 
: Arithmetic error if D = 1 (decimal mode). 
>REG USE PAX Y 
:STACK USE None. 
>RAM USE None. 


> LENGTH 18 

“CYCLES 83 (or 13 if positive value input). 

:CLASS 2 -discreet *interruptable *promable 

p-kkKKK kxreentrant *relocatable -robust 

ABSNEG LDA 3,X :Test sign of input number B5 03 
BPL EXIT sand skip if already positive. 10 OD 
LDY #4 :Set count for 4 bytes. AO 04 
SEC :No borrow in to subtraction. 38 

ABNGLP LDA #0 :Loop, subtracting all four A9 00 
SBC 0,X zbytes in turn from 0, with any F5 00 
STA 0O,X :borrows from previous bytes, and 95 00 
INX :restore negated result to same E8 
DEY :locations. X indexes bytes and 88 
BNE ABNGLP -Y counts ‘em off. DO F6 

EXIT RTS :Exit, number absolute. 60 

= ABS4 Absolute value of two 4-byte signed integers. 

:J0B To return the absolute value of two 32-bit 


Signed integers using two's complement notation 
: and the sign difference of the input values. 
sACTION Sign difference = sign 1st no. EOR sign 2nd no. 
: Index 1st number and call absolute routine. 
Index 2nd number and call absolute routine. 
CPU 6502 
>HARDWARE None. 
>SOFTWARE "ABSNEG" - a subroutine to test the 4-byte page 
: zero value indexed by X and negate if negative. 


ARITHMETIC: 32-BIT 


> INPUT M4,5,6,¢ contains Ist 32-bit number. 

: M8,9,A,B contains 2nd 32-bit number. 

:OUTPUT Both M4,5,6,/ and M8,9,A,B contain the absolute 
: values of the numbers input. X = QO. 


ME contains the sign difference of the two input 
numbers (ME = 1 if the signs were different). 

: P and A are changed. 

>ERRORS The absolute value of $80000000 cannot be 

: represented in 32-bit 2's complement notation. 

REG USE P A X 

:STACK USE 2 

>RAM USE M4 to MB and ME 


: LENGTH 19 

:CYCLES 33 + 2 * ABSNEG cycles. 

:CLASS 2 -discreet *interruptable *promable 

Prk kRRA xreentrant *relocatable -robust 

AZB = M4 >4-byte accumulator holding 1st number. 
AZC = M8 >4-byte accumulator holding 2nd number. 
SGN = ME >1 byte to output sign difference of 


:the two input numbers. 


ABS4 LDA AZB+3 :Exclusive-or 1st no. hi-byte A5 M7 
EOR AZC+3 zwith 2nd no. hi-byte, storing 45 MB 
STA SGN ssign difference in bit 7,ME. 85 ME 


LDX #AZB :Index 1st number from $00 and A2 M4 
JSR ABSNEG :convert it to absolute value. 20 lo hi 


LDX #AZC :Index 2nd number from $00 and A2 M8 
JSR ABSNEG :convert it to absolute value. 20 lo hi 


LDX #0 sExit with X cleared to a A2 00 
RTS suseful value of zero. 60 


:J0B To negate a 32-bit signed integer using two's 
: complement notation, returning sign status of 
: the result. 
:ACTION Index from low order byte. 
: Ensure no borrow in. 
FOR count of 4: 
{ Accumulator = 0 - indexed byte. 
Indexed byte = accumulator. 
Index next byte. J 
Move accumulator highest bit (sign) to Carry. 
2 CPU 6502 
>HARDWARE None. 
>SOFTWARE None. 


> INPUT 
> OUTPUT 


>ERRORS 


REG USE 
>STACK USE 
RAM USE 

> LENGTH 
CYCLES 


CLASS 2 
tm kk KER 


AZA = 


SNEG4 LDX 
SEC 


SNEGLP LDA 
SBC 
STA 

INX 

BMI 


ARITHMETIC: 32-BIT 


M0,1,2,3 contains 32-bit number to be negated. 
M0,1,2,3 contains negated 32-bit number. 

C = number sign bit (positive if C = 0). 

X = 0. A is changed. 

The negative value, $80000000, cannot be negated 
Since it has no positive equivalent in 32-bit 
2's complement notation. 

Arithmetic error if D = 1 (decimal mode). 

P A X 

None. 

MO M1 M2 M3 

14 

71 


-discreet *interruptable *promable 
*reentrant *relocatable -robust 


MO :4-byte accumulator holding number. 


#-4 :Index from lowest byte. A2 FC 
:To subtract with no borrow in. 38 


#0 :Loop, subtracting MO, M1, M2 A9 00 
AZA+4,X :and M3 in turn from 0, with any F5 M4 
AZA+4,X :borrows from previous bytes, and 95 M4 
zrestore negated result to same £8 
SNEGLP :locations. Leave A = hi-byte. 30 F7 


A >Move negated value sign bit to’ OA 
:Carry and exit. 60 


MULTIPLICATION AND DIVISION 


Both multiplication and division are straightforward, using the same 
‘long’ operations as the 16-bit operations of the last chapter for the 
main process. The difference lies in the need to convert possibly 
negative values to a positive form (i.e. find their absolute values) 
before the operation can be performed and to negate the appropriate 
result variable if any of the product, quotient or remainder signs are 


negative. 


ACTION 


To multiply two 32-bit signed integers using 
two's complement notation returning the 32-bit 
signed product or overflow information. 

ON overflow: £C Set overflow flag. Exit. ] 
Product sign = m'plier sign EOR m'cand sign. 
Multiplier = absolute multiplier. 

Multiplicand = absolute multiplicand. 

Product = Q. 
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ARITHMETIC: 32-BIT 


FOR multiplier bits: 
C Product = product * 2. 
Multiplier = multiplier * 2. 
Carry = multiplier overflow bit. 
IF Carry = 1 THEN: 
C Product = product + multiplicand. ] ] 
IF product sign negative THEN: 
C Product = 0 - product. J 
Set valid result flag. 
> CPU 6502 
sHARDWARE None. 
:SOFTWARE "“ABS4" - a subroutine to return the sign 
: difference and absolute values of two 4-byte 
signed integers. 
"SNEG4" - a subroutine to negate a 4-byte 
Signed integer. 


> INPUT M4,5,6,/ holds the 32-bit signed multiplier. 

: M8,9,A,B holds the 32-bit signed multiplicand. 
: OUTPUT All registers changed. ME changed. 

: C = O: multiplication completed. 


M0,1,2,3 holds the 32-bit signed product. 
M4,5,6,/7 = Q. 
M8,9,A,B holds the absolute multiplicand. 
C = 1: product overflow. 
: State of MO to MB uncertain. 
>ERRORS Arithmetic error if D = 1 (decimal mode). 
:REG USE PAX Y 
:STACK USE 4 
>RAM USE MO to MB and ME 


: LENGTH 63 

: CYCLES Minimum: 1756. Maximum: 4134. 

:CLASS 2 -discreet *interruptable *promable 

to kRRKA kreentrant *relocatable -robust 

AZA = MO :4-byte product accumulator. 

AZB = M4 74-byte multiplier accumulator. 

AZC = M8 -4-byte multiplicand accumulator. 

SGN = ME 71 byte containing product sign after 

scall to ABS4. 

SMUL4 JSR ABS4 :Get sign difference (i.e. 20 lo hi 

STX AZA zproduct sign) of m'plier and 86 MO 


STX AZA+1 z:m'cand, their absolute values 86 M1 
STX AZA+2 sand clear X. Use X to clear 86 M2 


STX AZA+3 sproduct accumulator. 86 M3 
LDY #32 :Set count for 32 bits. AO 20 
SMLP ASL AZA :Shift partial product left 06 M0 
ROL AZA+1 sby one bit (product * 2) 26 M1 
ROL AZA+t2 sto next bit place, ready for 26 M2 
ROL AZA+3 :add in of multiplicand. 26 M3 
BMI SMOVFW :Overflow if value goes to 30 26 
BCS SMOVFW :32 or 33 bits. BO 24 
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ASL 
ROL 
ROL 
ROL 
BCC 


LDX 
CLC 
SMADD LDA 
ADC 
STA 
INX 
BMI 
BVS 


SMLPT DEY 


SMEXIT CLC 


SMOVFW SEC 


sACTION 


ARITHMETIC: 32-BIT 


AZB :Shift multiplier left by one 06 M4 
AZB+1 sbit, moving next bit out to 26 M5 
AZBt :Carry to determine if m'cand 26 M6 
AZBt+1 :added to product at this bit 26 M7 


SMLPT :place, skipping if not. 90 OE 
#-4 :Index from lowest bytes. A2 FC 
>No carry in to addition. 18 
AZA+4,X :Add multiplicand to product B5 M4 
AZC+4,X sat correct place value for 75 MC 


AZA+4,X sbit place of multiplier bit. 95 M4 
:Index next higher bytes and E8 


SMADD srepeat for all 4 bytes. 30 F7 
SMOVFW :Exit if overflow. 70 OC 
:Repeat for all 32 bits 88 
SMLP :of multiplier. DO D9 
SGN :Test stored sign difference 24 ME 


SMEXIT :(€i.e. product sign), skip if 10 03 
SNEG4 :positive, else negate product. 20 lo hi 
:Set valid result flag, C = 0, 18 
sand exit, multiplication done. 60 


:Set product overflow flag, 38 
:C = 1, and exit on error. 60 


To divide one 32-bit signed integer by another, 
using two's complement notation, and returning 
the 32-bit signed quotient and 32-bit remainder 
or division by zero information. 
IF divisor = 0: 
THEN: | 
C Set division by zero flag. ] 
ELSE: 
[ Remainder sign = dividend sign. 
Quotient sign = sign (divisor EOR dividend). 
Acc-lo = absolute dividend. Acc-hi = 0. 
Divisor = absolute divisor. 
FOR acc-lo bits: 
C Acc = acc * 2, 
IF acc-hi >= divisor THEN: 
THEN: C Acc-hi = acc-hi - divisor. 
Quotient = quotient * 2+ 1. ] 
ELSE: C Quotient = quotient * 2. J] ] 
Remainder = acc~hi. 
IF remainder sign negative THEN: 
C Remainder = 0 - remainder. ] 
IF quotient sign negative THEN: 
C Quotient = 0 - quotient. J] 
Set valid result flag. ] 


ARITHMETIC: 32-BIT 


:CPU 6502 

>HARDWARE None. 

:SOFTWARE "ABS4" - a subroutine to return the sign 

: difference and absolute values of two 4-byte 
signed integers. 
"SNEG4" - a subroutine to negate a 4-byte 
Signed integer. 


INPUT M4,5,6,7 holds the 32-bit signed dividend. 

: M8,9,A,B holds the 32-bit signed divisor. 

OUTPUT C = 0: division completed. 

: M0,1,2,3 holds the 32-bit signed quotient. 
M4,5,6,7 holds absolute value of quotient. 
M8,9,A,B holds the 32-bit signed remainder. 
All registers changed. ME and MF changed. 

C = 1: division by zero error. 

Input divisor (M8,9,A,B) = 0. 

: A = 0. No other variable changed. 

sERRORS Arithmetic error if D = 1 (decimal mode). 

:REG USE PAX Y 

>STACK USE 4 


RAM USE MO to MB, ME and MF 


: LENGTH 106 

:CYCLES Minimum: 4237. Maximum: 6505. 

CLASS 2 -discreet *interruptable *promable 
rokkkke *reentrant *relocatable -robust 


:..-Working variables. 


SGN = ME :1 byte to hold quotient sign. 

SGNR = MF :1 byte to hold remainder sign. 

AZA = MO >4-byte dividend high half accumulator. 

AZB = M4 :4-byte dividend low half accumulator, 

:quotient shifted in to low half as 
:dividend shifts up to high half. 

AzC = M8 :4-byte divisor accumulator. 

SDIV4 LDA AzC :Check if divisor is zero A5 M8 
ORA AZC+1 sby ORing all divisor bytes 05 m9 
ORA AZCt2 :into A, leaving A = 0 only 05 MA 
ORA AZC+3 :if all divisor bytes = 0. 05 MB 
BEQ SDZERO :Exit immediately if zero. FO 5E 
LDA AZB+3 :Store dividend sign as sign A5 M7 
STA SGNR :of remainder. 85 MF 
JSR ABS4 >Get sign difference of divisor 20 lo hi 
STX AZA :and dividend as quotient sign, 86 M0 
STX AZA+1 :their absolute values and 86 M1 
STX AZA+2 sclear X. Use X to clear high 86 M2 
STX AZA+3 shalf of dividend accumulator. 86 M3 
LDY #32 :Set count for 32 bits. AO 20 

SDLP ASL AZB :Shift low half dividend left 06 M4 
ROL AZBti :by one bit moving next 26 M5 
ROL AZBt2 :dividend bit out into Carry 26 M6 
ROL -AZBt3 :for move into high half. 26 M7 
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ROL 
ROL 
ROL 
ROL 


LDX 
SEC 


SDSUB LDA 


SBC 
STA 
INX 
BMI 
BCS 


LDX 


SDADD LDA 


ADC 
STA 
INX 
BMI 
BEQ 


SDQBIT INC 


SDLPT DEY 


BNE 


BIT 
BPL 
JSR 


SDTFR LDX 
SDTFRL LDA 


BNE 


SDEXIT CLC 


SDZERO SEC 


AZA 

AZA+1 
AZA+2 
AZA+3 


#-4 


AZA+4,X 
AIC+4,X 
AZA+4,X 


SDSUB 
SDQBIT 


#-4 

AZA+4,X 
AZC+4,X 
AZA+4,X 


SDADD 
SDLPT 


AZB 


SDLP 


SGNR 
SDTFR 
SNEG4 


#-4 

AZA+4,X 
AZC+4,X 
AZB+4,X 
AZA+4,X 


SDTFRL 
SGN 


SDEXIT 
SNEG4 


ARITHMETIC: 32-BIT 


>Rotate dividend high half 

:left by one bit getting next 
:dividend bit into right bit 
zplace for divisor subtraction. 


:Index from lowest bytes. 

>No borrow in to subtraction. 
:Subtract divisor from high 
shalf of dividend at current 
:dividend bit place. 

:Index next higher bytes and 
srepeat for all 4 bytes. 
:Skip if subtracts okay. 


:Index from lowest bytes. 
:Add divisor back to dividend 
zhigh half to restore it to 
svalue held before subtraction. 
:Index next higher bytes and 
zrepeat for all 4 bytes. 
sLeave quotient bit = zero. 


:Sub. gone: set quotient bit. 


:Repeat for all 32 bits 
:of multiplier. 


:Test stored remainder sign, 
:skip if positive, 
:else negate remainder. 


:Index from lowest bytes. 
:Move remainder from dividend 
shigh half to remainder 1/0. 
:Move quotient from dividend 
:low half to quotient I/0. 
:Index next higher bytes and 
:repeat for 4 bytes of each. 


:Test stored quotient sign, 
:skip if positive, 

selse negate quotient. 

:Set valid result flag, C = 0, 
sand exit, division done. 


:Set division by zero flag, 
:C = 1, and exit on error. 


lo hi 


lo hi 
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This index lists the names of the datasheets in alphabetic order, 
the labels of all entry points and local subroutines in the 
datasheets, and the names of subroutines given in the text. 


Routine 


ABSNEG 
ABS4 
ADC4 
ADD4 
ALSORT 
ASLXY 
ASL4 


BCNV 


BGCB 
BINX 


BIRCH 
BOX 


CASEOF 
CHKSUM 
COX 

CTRPRT 
CURT16 
CURT32 


DBLBIN 
DL1$S 
DOT1 
DOT2 
DOTS 
DOT4 
DOTS 
DUBDIV 
DUBMUL 
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Page Description 


166 
167 
135 
135 
54 

IS 

135 


155 


158 
153 


44 
29 


58 
86 
30 
94 
132 
134 


104 
105 
106 
108 
109 
161 
163 


Absolute value of a 4-byte signed integer. 
Absolute value of two 4-byte signed integers. 
4-byte addition with carry. 

4-byte addition without carry. 

ASCII sort of variable length strings. 
Arithmetic Shift Left XY. 

4-byte Arithmetic Shift Left. 


Base conversion: unsigned ASCII (base 2 to 36) to 
unsigned ASCII (base 2 to 36). 

Binary to or from Gray Code conversion. 

32-bit to unsigned ASCII (base 2 to 36) conversion. 


Break, Interrupt and Relative Call Handler. 
Stack registers, block exchange memory/stack. 


Single byte key CASE routine. 

BCD Checksum. 

Block exchange memory/stack, unstack registers. 
Control character name print. 

Integer cube root of a 16-bit unsigned value. 
Integer cube root of a 32-bit unsigned value. 


Double 4-byte value, see SADB4S 
Delay for one second. 

Put graphics dot information. 
Put graphics dot information. 
Put graphics dot information. 
Put graphics dot information. 
Put graphics dot information. 
16-bit unsigned division. 

16-bit unsigned multiplication. 


Routine 


ECAL 
EFIX 
EXAX 
EXAY 
EXPD 
EXSX 
EXSXY 
EXXY 


FIND 
FOWIA 


GETA 
GETLOC 
GETPC 
GYCON 


HLFXY 


IBT 
INCA 
INDXY 
INSUB 


LARCH 
LFEED 
LSR4 
LSTFMT 
LONGBR 


MAKMSG 
MATCH 


PLXYAP 
PLYXAP 
POS 
PRAY 
PSHU 
PULU 


RANDI 
RESTOR 
RINXY 
RISUB 
RLAXY 
RLTVL 
RND16B 
RND16 
RND31 
RND32 
ROLL 
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116 
121 
119 


Calculate error correction (parity) byte. 
Validate data with error correction byte. 
Exchange A with X. 

Exchange A with Y. 

Expand 12-bit data. 

Exchange (SP),index register X. 
(Extended form of EXSX). 

Exchange X with Y. 


Find address of memory block. 
Find Out Where I’m At (Z-80 code). 


Get stacked A. 

Get program location. 

Get Return Address from stack. 

Binary nibble to Gray Code conversion. 


Halve 4-digit BCD string in registers. 


Intelligent Block Transfer. 

Increment Accumulator A. 

Register Indirect addressing mode emulator. 
(Constructed by INDXY), see INDXY 


(Long Address BIRCH extension), see BIRCH 
(LSTFMT line feed), see LSTFMT 

4-byte Logical Shift Right. 

Formatted assembler listing. 

16-bit offset, complex-conditional branch. 


Make message from sub-messages. 
Keyword substring match. 


Pull register set X, Y, A, P and PC. 

Pull register set Y, X, A, P and PC. 
(LSTFMT cursor position read), see LSTFMT 
Print 16-bit value as ASCII decimal. 

Push specified registers to User Stack. 
Pull specified registers from User Stack. 


Pseudo-random integer in binary or BCD. 
Restore registers and 2 page zero bytes. 
Register Indirect addressing mode emulator. 
(Constructed by RINXY), see RINXY 

Rotate left by one byte through A, X, Y. 
Program relative subroutine call. 

Compute 16 random bits. 

16-bit pseudo-random number generator. 
31-bit pseudo-random number generator. 
32-bit pseudo-random number generator. 
(Graphics byte rotation through Carry), see DOT5 
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Routine 


ROL4 
ROR4 
ROTREX 
RRAXY 
RRL 


SADB4Q 
SADB4S 
SAVE 
SBAD4 
SBC4 
SDIV4 
SLOWUP 
SMUL4 
SNEG4 
SPLITO 
SPLIT3 
SQR15 


SQR16 
SQR31 
SQR32 


SQSH 
SUB4 


TABOUT 
TERAXY 
TEXT 
TKNIN 
TKNOUT 
TRANS 
TSY 
TXY 
TYX 


UDELAY 
XBIN 
XYMOD 


XYMODS 
XYSUB 
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Page Description 


135 4-byte Rotate Left. 
135 4-byte Rotate Right. 


13 
13 
77 


140 
142 
27 
145 
135 
171 
6 
169 
168 
106 
105 
125 


125 
127 
127 


62 
135 


16 
/3 
66 
67 
81 
IS 
13 
13 


3 
1S] 


19 
23 


Register rotate, transfer and exchange: toolkit. 
Rotate right by one byte through A, X, Y. 
Rotate memory right long. 


ASCII decimal to 4-byte signed integer (quick). 
ASCII decimal to 4-byte signed integer (short). 
Save registers and 2 page zero bytes. 

4-byte signed integer to ASCII decimal. 

4-byte subtraction with carry (borrow). 

4-byte signed integer division. 

Switchable delay. 

4-byte signed integer multiplication. 

Negate 4-byte signed integer. 

Split graphics dot information. 

Split graphics dot information. 

Integer square root of a 16-bit signed (positive) 
value. 

Integer square root of a 16-bit unsigned (absolute) 
value. 

Integer square root of a 32-bit signed (positive) 
value. 

Integer square root of a 32-bit unsigned (absolute) 
value. 

Squash 12-bit data. 

4-byte subtraction without carry (borrow). 


(LSTFMT tabulation), see LSTFMT 

Register transfer, exchange and rotate: toolkit. 
Output program embedded text. 

“Tokenise’ ASCII text. 

Output expanded “Tokenised”’ text. 
Own-space matrix transposition. 

Transfer Stack Pointer to Y. 

Transfer X to Y. 

Transfer Y to X. 


Universal delay. 


Unsigned ASCII (base 2 to 36) to 32-bit conver- 
sion. 

Modify absolute address operand. 

Copy instruction with modified address operand. 
(Constructed by SYMODS), see XYMODS 


Absolute addresses viii, x1 

ACTION (datasheet section) 1x 
Algorithm construction 129 

Array storage 80 

Art of Computer Programming, The 115 
ASCII 93, 139, 149 

Assembler standards xi , 95 
Associated key 57 


Bases (2-36) 151, 155 

BASIC 57, 95 

BBC assembler 95 

BBC microcomputer 93, 103 
BCD (Binary coded decimal) 113, 159 
Biterror 88 

Bit inversion 88 

Block addressing 40 

Break flag 44 

Breakpoint 44 

Brown, P.J. 116 

Bubble sorts 53 


Checksums 85 

CLASS (datasheet section) x 
Clock cycles vill 

Clock speed 2 

Condition tests 47 

Control codes 93 

Conversion method 139, 145 
CPU (datasheet section) 1X 
Cube root method 129 
CYCLES (datasheet section) x 


Datasheets v , 1x 
Decimal mode flag 159 
Delay formula 2 
Discreet x 
Documentation vi , ix 
Dummy instructions 2 
Dummy opcode 13 
Dynamic addressing 39 
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INDEX 


Elegance 105 

Elegant solution 103 
Embedded parameters 28 
Embedded text 73 

ERRORS (datasheet section) x 
Escapes 69 

Expansion table 64 

Extra addressing modes 1] 
Extra NOPs_ 11 


Formatted listing 95 
Function key 100 


Gray Code 155 


Hadden, David R. (Jr.) vi 

HALT 11 

HARDWARE (datasheet section) 1x 
Hexdump 93 

Hex symbol 101 

Hughes, Thomas P. vi 


INPUT (datasheet section) 1x 
Intelligent transfer 74 
Interfacing protocol 165 
Interruptable x 


Jargon phrases 71 
JOB (datasheet section) ix 


Keyword search 51 
Knuth, Donald E. 115 


LENGTH (datasheet section) x 
Leventhal, Lance A. 85 

Line feed 100 

Look-up tables 106 


Matrix transposition 80 

Memory block rotation 77 
Microcomputer Software Design vi 
Modifying code 20 

Multiplication by factorisation 116 
Multiplication by shifting 116, 139 


Negative sign 165 


ON..GOSUB.. 57 
OUTPUT (datasheet section) x 
Overflow flag 149 


Page zero viii, xi, 27, 29, 165 
Parity coding 88 

PCW SubSet v 
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INDEX 


Potency 115 

Positive sign 165 

Product sign 166 

Program relative addressing 41, 156 
Promable x 

Pseudo-random sequence 111 
Pseudo-registers x, 29, 165 


Quicker sorting 57 
Quotient sign 166 


RAM USE (datasheet section) x 

Random binary values 111 

Random number formulae 113, 114, 121 
Random number theory 114 

Re-entrant XI 

Register use (datasheet section) x 
Relocatable x1 

Reset 11 

RESTART (Z-80 1-byte subroutine call) 44 
Robust x1 


Sawin, Dwight H. (III) vi 
Seeding 111, 122 
SOFTWARE (datasheet section) 1x 
Software interrupt 44 

Space bar 5 

Spectral test 115 

Square root method 129 

Stack exchanges 17 

Stacking error 21 

Stack page 27 

Stack use (datasheet section) x 
Stack workspace 149 

String termination 64 
Subroutines 12 

SubSet, see PCW SubSet 


Time states, see clock cycles 
Timing overheads 2, 12, 140 
Tokens 51, 64 | 
Toolkits 12 
Two’scomplement 145, 165 


Unspecified instructions 9 
Upper-case ASCII 151 

U.S. Army ElectronicsCommand vi 
User stack 32 


VIA (Versatile Interface Adapter) 112 
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INDEX 


Wait function 5 

Weighting 85 

Wraparound addressing 74 

Wraparound arithmetic 155 

Writing Interactive Compilers and Interpreters 166 


Zero page, see page zero 
6502 Assembly Language Programming 85 
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